” Aid optimists “I have identified the specific investments that are needed [to end poverty]; found ways to [to end poverty]; found ways to plan and implement them; [and] shown that they can be affordable ” affordable. Jeffrey Sachs End of Poverty 2 Image by Angela Radulescu on Flickr.
225
Embed
Aid optimists - Faculty of Artsfaculty.arts.ubc.ca/fpatrick/documents/RCT-Lecture-2018.pdf · 2018-10-26 · Aid optimists “I have identified the specific investments that are needed
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
”
Aid optimists
“I have identified the specific investments that are needed [to end poverty]; found ways to[to end poverty]; found ways to plan and implement them; [and] shown that they can be affordable ”affordable.
• Can do more with given budget with better evidence
• If people knew money was going to programs thatworked, could help increase pot for anti‐poverty programsprograms
• Instead of asking “do aid/development programswork?” should be asking:– Which work best, why and when?
– How can we scale up what works?
5
Impact: What is it?om
e Intervention
ry O
utco Impact
Prim
ar
Time 21
CounterfactualThe counterfactual represents the state of the world that program participants would have experienced in the absence of the program
Problem: Counterfactual cannot be observed
Solution: We need to “mimic” or construct the counterfactual
J-PAL | WHY RANDOMIZE 19
Presenter
Presentation Notes
The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program) Problem: Counterfactual cannot be observed Solution: We need to “mimic” or construct the counterfactual
Constructing the counterfactual
• Usually done by selecting a group of individuals that did not participate in the program
• This group is usually referred to as the control group or comparison group
• How this group is selected is a key decision in the design of any impact evaluation
J-PAL | WHY RANDOMIZE 20
Presenter
Presentation Notes
Estimating the impact (a.k.a the causal effect) of a program involves a comparison between the outcome had the intervention been introduced and the outcome had the intervention not been introduced. The latter is usually referred to as the counterfactual The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program) The counterfactual does not represent the state of the world in which participants receive absolutely no services, but rather the state of the world in which participants receive whatever services they would have received had they not participated in the program being evaluated Example: Training program The counterfactual can never be directly observed Hence, the main goal of an impact evaluation can be viewed as an effort to construct or mimic the counterfactual This is usually done by selecting a group of individuals that did not participate in the program This group is usually referred to as the control group (in case of a social experiment) or comparison group (in case we are using non-experimental methods to estimate the impact) How this group is selected is a key decision in the design of any impact evaluation The idea is to select a group that is exactly like the group of participants in all ways except one: their exposure to the program being evaluated The goal in the end is to be able to attribute differences in outcomes between the group of participants and the control/comparison group to the program (and not to other factors)
Selecting the comparison group
• Idea: Comparability
• Goal: Attribution
J-PAL | WHY RANDOMIZE 21
Presenter
Presentation Notes
Idea: Select a group that is exactly like the group of participants in all ways except one: their exposure to the program being evaluated Goal: To be able to attribute differences in outcomes between the group of participants and the comparison group to the program (and not to other factors) The critical objective of impact evaluation is to establish a credible comparison group – a group of individuals who in the absence of the program would have had outcomes similar to those who were exposed to the program. However, in reality it is generally the case that individuals who participate in a program and those who were not are different: programs are placed in specific areas (for example, poorer or richer areas) individuals are screened for participation in the program (for example, on the basis of poverty or on the basis of their motivation) and, in addition, the decision to participate is often voluntary.
II – WHAT IS A RANDOMIZED EXPERIMENT?
The basics
Start with simple case:• Take a sample of program applicants• Randomly assign them to either: Treatment Group – is offered treatment Control Group – not allowed to receive treatment (during
the evaluation period)
J-PAL | WHY RANDOMIZE 26
Key advantage of experiments
Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,
any difference that subsequently arises between them can be attributed to the program rather than to other factors.
27J-PAL | WHY RANDOMIZE 27
Presenter
Presentation Notes
In all other impact evaluation methods, we need to assume that the two groups do not differ systematically at the outset or that any differences between them have been statistically accounted for But there is no way to test this assumption.
Evaluation of “Women as Policymakers”: Treatment vs. Control villages at baseline
Variables Treatment Group
Control Group Difference
Female Literacy Rate 0.35 0.34 0.01(0.01)
Number of Public Health Facilities 0.06 0.08 -0.02(0.02)
Tap Water 0.05 0.03 0.02(0.02)
Number of Primary Schools 0.95 0.91 0.04(0.08)
Number of High Schools 0.09 0.10 -0.01(0.02)
Standard Errors in parentheses. Statistics displayed for West Bengal*/*/***: Statistically significant at the 10% / 5% / 1% levelSource: Chattopadhyay and Duflo (2004)
J-PAL | WHY RANDOMIZE 28
Some variations on the basics
• Assigning to multiple treatment groups
• Assigning of units other than individuals or households
Health Centers Schools Local Governments Villages
J-PAL | WHY RANDOMIZE 29
Presenter
Presentation Notes
Assigning to units other than people/households Health Centers (in tracking nurse attendance) Schools (Measuring infrastructure) Local Governments (Assessing corruption)
Key Steps in conducting an experiment
1. Design the study carefully
2. Randomly assign people to treatment or control
3. Collect baseline data
4. Verify that assignment looks random
5. Monitor process so that integrity of experiments is not
compromised
J-PAL | WHY RANDOMIZE 30
Presenter
Presentation Notes
These 8 steps present a very simplified description of the process. Idea is to give a complete picture on how this works in a typical experiment.
Key Steps in conducting an experiment (contd.)
6. Collect follow-up data for both the treatment and
control groups
7. Estimate program impacts by comparing mean
outcomes of treatment group vs mean outcomes of the
control group
8. Assess whether program impacts are statistically
significant and practically significant
J-PAL | WHY RANDOMIZE 31
Presenter
Presentation Notes
These 8 steps present a very simplified description of the process. Idea is to give a complete picture on how this works in a typical experiment.
III – WHY RANDOMIZE?
If properly designed and conducted, randomized experiments provide the most credible method to estimate the impact of a program
Why Randomize?- Conceptual Argument
J-PAL | WHY RANDOMIZE 41
Presenter
Presentation Notes
In all other impact evaluation methods, we need to assume that the two groups do not differ systematically at the outset or that any differences between them have been statistically accounted for But there is no way to test this assumption.
Why “most credible”?
Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,
any difference that subsequently arises between them can be attributed to the program rather than to other factors.
J-PAL | WHY RANDOMIZE 42
Presenter
Presentation Notes
In all other impact evaluation methods, we need to assume that the two groups do not differ systematically at the outset or that any differences between them have been statistically accounted for But there is no way to test this assumption.
t t
counter actua
Constructing the counterfactual
• Counterfactual is often constructed by selecting aff d b thgroup not affected by the program
• Randomized:– Use random assignment of the program to create acontrol group which mimics the counterfactual.
• Non‐randomized:– Argue that a certain excluded group mimics the
f lcounterfactual.
22
Example #3 Balsakhi Program
J-PAL | WHY RANDOMIZE 58
Presenter
Presentation Notes
Revisiting balsakhi classes
Balsakhi Program: Background
• Implemented by Pratham, an NGO from India• Program provided tutors ( Balsakhi) to help at-risk
children with school work• In Vadodara, the balsakhi program was run in
government primary schools in 2002-2003• Teachers decided which children would get the balsakhi
J-PAL | WHY RANDOMIZE 59
Presenter
Presentation Notes
In 1994 Pratham launched the Balsakhi Program to help at-risk children acquire the basic skills they need to participate fully in the classroom. The program provided tutors for at-risk children in government schools. The tutor, called a balsakhi, or “child’s friend,” was typically a young woman hired from the local community. Balsakhis were paid between 500 and 750 rupees (US$10-15) a month. All the balsakhis had completed at least secondary school, and they were given two weeks’ training at the beginning of the school year. The program targeted children who had reached grades 3 and 4 without mastering grades 1 and 2 reading and math competencies, including spelling simple words, reading simple paragraphs, recognizing numbers, counting up to 20, and subtracting or adding single-digit numbers. Children who were lagging behind—identified as such by the teacher—were pulled out of the regular class in groups of 20 and sent for remedial tutoring, spending half the school day with the tutor.
siwan
Highlight
Balsakhi: Outcomes
• Children were tested at the beginning of the school year (Pretest) and at the end of the year (Post-test)
• QUESTION: How can we estimate the impact of the balsakhi program on test scores?
J-PAL | WHY RANDOMIZE 60
Methods to estimate impacts
• Let’s look at different ways of estimating the impacts using the data from the schools that got a balsakhi
1. Pre – Post (Before vs. After)2. Simple difference3. Difference-in-difference4. Other non-experimental methods5. Randomized Experiment
J-PAL | WHY RANDOMIZE 61
• Look at average change in test scores over the school year for the balsakhi children
1 - Pre-post (Before vs. After)
J-PAL | WHY RANDOMIZE 62
QUESTION: Under what conditions can this difference (26.42) be interpreted as the impact of the balsakhi program?
Average post-test score for children with a balsakhi
51.22
Average pretest score for children with a balsakhi
24.80
Difference 26.42
1 - Pre-post (Before vs. After)
J-PAL | WHY RANDOMIZE 63
2 - Simple difference
Children who got balsakhi
Compare test scores of…
Children who did not get balsakhi
With test scores of…
J-PAL | WHY RANDOMIZE65
2 - Simple difference
QUESTION: Under what conditions can this difference (-5.05) be interpreted as the impact of the balsakhi program?
Average score for children with a balsakhi
51.22
Average score for children without a balsakhi
56.27
Difference -5.05
J-PAL | WHY RANDOMIZE 66
3 – Difference-in-Differences
Children who got balsakhi
Compare gains in test scores of…
Children who did not get balsakhi
With gains in test scores of…
J-PAL | WHY RANDOMIZE 68
3 – Difference-in- difference
• QUESTION: Under what conditions can this difference (-5.05) be interpreted as the impact of the balsakhi program?
Pretest Post-test Difference
Average score for children with a balsakhi
24.80 51.22 26.42
J-PAL | WHY RANDOMIZE 69
3 – Difference-in-difference
Pretest Post-test Difference
Average score for children with a balsakhi
24.80 51.22 26.42
Average score for children without a balsakhi
36.67 56.27 19.60
J-PAL | WHY RANDOMIZE 70
3 – Difference-in-difference
Pretest Post-test Difference
Average score for children with a balsakhi
24.80 51.22 26.42
Average score for children without a balsakhi
36.67 56.27 19.60
Difference 6.82
J-PAL | WHY RANDOMIZE 71
• Suppose we evaluated the balsakhi program using a randomized experiment
• QUESTION #1: What would this entail? How would we do it?
• QUESTION #2: What would be the advantage of using this method to evaluate the impact of the balsakhi program?
5 – Randomized Experiment
J-PAL | WHY RANDOMIZE 73
How to Randomize
Presenter
Presentation Notes
Marc will present this
Random Selection
Presenter
Presentation Notes
15 seconds First it’s important to distinguish between Random Selection and Random Assignment Both are motivated by the same principle: to get a representative sample of the population.
Random Selection
7J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
15 seconds So say this is a map of a city in India (actually it’s the location of one of our first Microcredit studies) Say we divide it up into about 400 geographic units.
Random Selection
8J-PAL | WHAT IS EVALUATION
Monthly income, per capita
1000
500
0Population
1250
Presenter
Presentation Notes
15 seconds This is our sampling frame. Let’s also say the average monthly income per capita in this city is is 1250 rupees.
Random Selection
Randomly samplefrom area of interest
Presenter
Presentation Notes
15 seconds If we were to take a random sample of about 60: If it were truly a random sample, we should expect that the income of that random sample would be around 1250
Random Selection
Monthly income, per capita
1000
500
0Population Sample
12521250
Presenter
Presentation Notes
30 seconds Indeed, it’s close. 1252. With random samples, it’s unreasonable to expect the average to be exactly the average of the population. But we can get pretty close. And what’s important here is that it’s not “statistically distinguishable”. Or in other words. The difference is not “statistically significant”
Random Assignment
Randomly assignto treatment
Presenter
Presentation Notes
30 seconds With random assignment, we also start with a sampling frame. Here it’s with our random sample. But we could have also started with the entire population: all 400 communities We randomly assign half to the treatment group
Random Assignment
Monthly income, per capita
1000
500
0Population Treatment
12571250
Presenter
Presentation Notes
30 seconds And once again, the income of that sample, of the treatment group, is close to the population mean.
Random Assignment
Randomly assignto treatmentand control
Presenter
Presentation Notes
15 seconds And then the rest are assigned to the control
Random Assignment
Monthly income, per capita
1000
500
0Population Treatment Control
1257 12441250
Presenter
Presentation Notes
15 seconds And here, we also find a number very close to the population mean. Again, just as the original random sample of 60 wasn’t exactly 1250. We wouldn’t necessarily expect the random sample of 30 in the treatment group and 30 in the control group to be exactly 1250 But again, the difference here between the average income in the treatment and control groups are statistically Insignificant, And both are statistically insignificant from the population mean. The two groups are statistically equivalent. Or the two groups are balanced.
Alternate methods of Randomization?
15J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
30 seconds So say this is a map of a city in India (actually it’s the location of one of our first Microcredit studies) Say we divide it up into about 400 geographic units.
NOT Random Assignment
17J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
1 minutes You’d be surprised how many evaluations claim to have a randomized controlled design, and create designs exactly like this. NOTE TO PRESENTER: IF YOU HAVE EXAMPLES YOU’RE AWARE OF, PLEASE USE HERE.
NOT Random Assignment
Monthly income, per capita
1000
500
0Population Treatment Control
1453
1250
942
Presenter
Presentation Notes
30 seconds While there’s some remote possibility that this city is geographically homogenous. Or the poor and rich are uniformly distributed all over the city. That is extremely unlikely. In expectation, we would see different means. Therefore: the treatment group and the control group are NOT balanced. (as you can see from the difference in income)
Simple randomization: Fixed probability • For each member, set
probability (e.g. 50%).– Spot randomization
– Point-of-servicerandomization
• May end up with slightlymore in one group andfewer in the other
J-PAL | HOW TO RANDOMIZE 20
ID Coin Treatment/Control
1 Heads T
2 Heads T
3 Tails C
4 Heads T
5 Tails C
6 Heads T
7 Tails C
8 Tails C
9 Heads T
10 Heads T
Count: T: 6C: 4
Presenter
Presentation Notes
2 min Use Physician Teams or Hotspotting as an example
Complete randomization: Fixed proportion• Need sample frame• Determine number in
treatment (and in control)
• Pull out of a hat/bucket-or-
• Use random numbergenerator to orderobservations randomly
Source: Chris Blattman
J-PAL | HOW TO RANDOMIZE 21
Presenter
Presentation Notes
2 min USE DECISION SUPPORT AS AN EXAMPLE Need sample frame Talk about public lottery vs. Random # generator Pull out of a hat/bucket Transparent Time consuming, complex if large group Hard to stratify on many dimensions Use random number generator to order observations randomly Typically we use stata But most statistical programs—even excel—can do this (in fact, you will be doing this soon) Stata program code Circulate some examples What if no existing list? Walk ins, randomize on the spot
Unit of Randomization: Individual?
J-PAL | HOW TO RANDOMIZE 23
Presenter
Presentation Notes
15 seconds So first let’s start with the basics We have about 400 students in the city
Unit of Randomization: Individual?
J-PAL | HOW TO RANDOMIZE 24
Presenter
Presentation Notes
15 seconds We could randomly assign each individual to treatment and control If we started with a list of individuals, and wanted a complete randomization design, we could ensure we have 200 in the treatment and 200 in the control
Unit of Randomization: Clusters?
J-PAL | HOW TO RANDOMIZE 25
Presenter
Presentation Notes
15 seconds But say the intervention we want to test is teacher training We couldn’t have some kids in the classroom be taught Professor Doyle with training, and some with Professor Doyle without training Here, we’d want to do a cluster randomized trial.
Unit of Randomization: Class?
J-PAL | HOW TO RANDOMIZE 26
Presenter
Presentation Notes
15 seconds The Unit of randomization here may be the Cluster.
Unit of Randomization: Class?
J-PAL | HOW TO RANDOMIZE 27
Presenter
Presentation Notes
15 seconds Then entire classes would randomized to treatment and control
Unit of Randomization: School?
J-PAL | HOW TO RANDOMIZE 28
Presenter
Presentation Notes
15 seconds And if our intervention was training for principals on better school administration
Unit of Randomization: School?
J-PAL | HOW TO RANDOMIZE 29
Presenter
Presentation Notes
15 seconds We’d want to randomize at the school leve.
An education department wants to see if increasing the duration of recess can help reduce rates of obesity. What is the appropriate unit of randomization?
A. Child level
B. Household level
C. Classroom level
D. School level
E. Village level
F. Don’t know
A. B. C. D. E. F.
22%
0% 0%0%
56%
22%
J-PAL | HOW TO RANDOMIZE 30
Presenter
Presentation Notes
2 min
The department of agriculture believes that if farmers used more fertilizer yields would improve. One advisor believes organic fertilizer will be more effective; a second believes inorganic fertilizer is better; a third believes neither will be effective. Can we test all three beliefs within one single experiment?
A. Yes, and we should
B. No, they can only be answered with twoseparate experiments
C. No they can only be answered with three separate experiments
D. Yes, but best practice is to run separate experiments
E. Don’t knowA. B. C. D. E.
71%
0%
14%
0%
14%
J-PAL | HOW TO RANDOMIZE 34
Presenter
Presentation Notes
3 min
Treatment 1Treatment 2Control
Multiple treatments
J-PAL | HOW TO RANDOMIZE 35
Presenter
Presentation Notes
15 seconds We’ve been talking as though we have ONE treatment and a control. But it’s entirely possible to have multiple treatments
Cross-cutting treatments:Factorial Design
J-PAL | HOW TO RANDOMIZE 38
Performance-based pay
Y N
YGroup 1
+ CashPerformance
Group 2Cash
NGroup 3
PerformanceGroup 4Control
CashGrants
Presenter
Presentation Notes
2 min Now let’s use a more realistic example (We can customize this for HOTSPOTTING: Cell phones, Home visits) Test whether components serve as substitutes or compliments Is the whole (the interaction): Greater than, less than, or equal to the sum of its parts What is most cost-effective combination Advantage: win-win for operations, can help answer questions for them, beyond simple “impact”!
Cross-cutting treatments:Factorial Design
J-PAL | HOW TO RANDOMIZE 40
Presenter
Presentation Notes
15 seconds
Cross-cutting treatments:Factorial Design
J-PAL | HOW TO RANDOMIZE 42
Presenter
Presentation Notes
15 seconds
Cross-cutting treatments:Factorial Design
J-PAL | HOW TO RANDOMIZE 43
Presenter
Presentation Notes
15 seconds
Varying intensity of treatment
• To Measure:– Dosage
– Sensitivity
– Elasticity
– Spillovers
J-PAL | HOW TO RANDOMIZE 44
Presenter
Presentation Notes
1 min In medical trials the question is rarely as simple as: “Is penicillin effective at treating pneumonia?” Often the question is much more detailed What dosage of penicillin is needed, how many times a day? How long is the course? In these cases, we have many treatment arms, each is given a different dosage, different course length, etc And through this, we figure out the optimal dosage, optimal course duration, maximizing the benefits, minimizing the negative side effects (for example drug resistance, or disrupting the natural microbiome in your gut) How might this apply to a social program question?
Varying intensity of treatment (individual)
• Dosage
• Sensitivity
• Elasticity
J-PAL | HOW TO RANDOMIZE 45
Presenter
Presentation Notes
2 min How much should we charge for preventative health products? We know that when we charge the market rate, people underutilize In other words, people do not buy bednets and stop the spread of malaria, They do not get immunized Or many farmers do not buy fertilizer But if we hand these products out for free, people may take them for granted and not use them We may destroy a partially functioning market for these products And it costs a lot of money to subsidize. We may be overspending on this one problem, only to be short of resources for other problems. So what’s the optimal price?
Challenge 1: Difficult (logistically or politically) for Service Providers• Service providers have trouble distinguishing between
treatment and comparison (or customizing service)
J-PAL | HOW TO RANDOMIZE 53
treatment
comparison
Crossovers: Control receives intervention (No longer represents pure counterfactual)
Services provided to both
Presenter
Presentation Notes
1 min In medical trials, clinical researchers are so concerned about doctors’ ability to provide one randomly assigned treatment to one patient, and a different randomly assigned treatment (or status quo) to another, so it’s common practice to take discretion away from the doctor. They design “double-blind” trials. Where the patient doesn’t know which treatment their getting, but the doctor doesn’t know as well. Both the treatment and control “pills” appear identical, and the doctor not informed which pill is being given to which patients. This can be difficult or impossible once we start experimenting with different procedures or processes. If we wanted to test the effectiveness of a new process, and trained nurses on it, we couldn’t ask them to “apply” that training to some patients, and to “forget it” or “unlearn it” for others. I have an example of a project….(use “physician teams”)
Solution 1a: Assign to Different Service Providers• Service providers have trouble distinguishing between
treatment and comparison (or customizing service)
• Have different teams provide the different treatments• Randomly assign to those teams
J-PAL | HOW TO RANDOMIZE 54
treatment
comparison
Presenter
Presentation Notes
30 seconds I have an example of a project…. (Physician teams?)
Solution 1b: Randomize at a different unit• Service providers have trouble distinguishing between
treatment and comparison (or customizing service)
• Change the unit of random assignment• Have providers treat entire clusters the same
J-PAL | HOW TO RANDOMIZE 55
treatment
comparison
Presenter
Presentation Notes
30 seconds
Challenge 2a: Control group finds out about treatment• If treatment and control individuals know each other, the
control may get upset.
• Service providers may lose support of community• Attrition: Control withdraws participation from research
J-PAL | HOW TO RANDOMIZE 57
treatment
comparison
Friends in control group get upset with researchers or service providers
Talks with friends (treatment and control)
Presenter
Presentation Notes
15 seconds
Challenge 2b: Control group benefits from treatment• If treatment and control individuals know each other, the
treatment may share benefits with control.
J-PAL | HOW TO RANDOMIZE 58
Presenter
Presentation Notes
1 min
Challenge 2e: Control group harmed by treatment• If treatment and control individuals compete with each
other, the control may be harmed.
J-PAL | HOW TO RANDOMIZE 61
Without experiment
With experimentTreatment group Control group
Presenter
Presentation Notes
1 min
Solution 2a: Varying the unit to contain spillovers
J-PAL | HOW TO RANDOMIZE 62
treatment
comparison
friends
Presenter
Presentation Notes
30 seconds
Solution 2b: Creating a Buffer
J-PAL | HOW TO RANDOMIZE 63
Not sampled
Presenter
Presentation Notes
30 seconds
But perhaps not all at once
Challenge 3: Have resources to treat everyone. (Where’s the control group?)
J-PAL | HOW TO RANDOMIZE 67
Presenter
Presentation Notes
30 seconds Say you have the research constraint of no resource constraint. It is still possible your partner is constrained by time, by logistics They cannot provide the benefit to everyone all at once. In such a case, perhaps you can phase-in the program.
Solution 3: Phase In
J-PAL | HOW TO RANDOMIZE 68
Presenter
Presentation Notes
15 seconds It is still possible your partner is constrained by time, by logistics They cannot provide the benefit to everyone all at once. In such a case, perhaps you can phase-in the program.
Phase 0: No one treated yetAll control
J-PAL | HOW TO RANDOMIZE 69
Presenter
Presentation Notes
15 seconds
Phase 1: 1/4th treated 3/4ths control
J-PAL | HOW TO RANDOMIZE 71
Presenter
Presentation Notes
15 seconds
Phase 2: 2/4ths treated 2/4ths control
J-PAL | HOW TO RANDOMIZE 72
Presenter
Presentation Notes
15 seconds
Phase 3: 3/4ths treated 1/4th control
J-PAL | HOW TO RANDOMIZE 73
Presenter
Presentation Notes
15 seconds
Phase 4: All treated No control (experiment over)
J-PAL | HOW TO RANDOMIZE 74
Presenter
Presentation Notes
15 seconds By Phase 4, your experiment is over. So if you plan to use this approach, you better hope the phasing in, and actually, the duration of each phase takes long enough for the outcomes of the treatment group to change.
Challenge 4: There’s an eligibility criteria
J-PAL | HOW TO RANDOMIZE 78
Peo
ple
Income
Presenter
Presentation Notes
2 min Use example from VA Suicide prevention (feel free to change the X axis)
Challenge 4: There’s an eligibility criteria
J-PAL | HOW TO RANDOMIZE 79
Peo
ple
Income
Cut-offEligible Ineligible
Presenter
Presentation Notes
1 min [Feel free to discuss RDD here]
Solution 4: Relax the eligibility criteria
J-PAL | HOW TO RANDOMIZE 80
Peo
ple
Income
Cut-offEligible IneligibleNew Cut-off
Presenter
Presentation Notes
30 seconds
Solution 4: Randomize “on the bubble”
J-PAL | HOW TO RANDOMIZE 81
Peo
ple
Income
Cut-offRemain Eligible
RemainIneligibleNew Cut-off
Not in Study
Not in Study
Study Sample
Presenter
Presentation Notes
1 min [Take the time to read each box, since there may be some confusion between treatment and control group (within the study sample) vs. Receiving the program (eligible and not in study) and not receiving the program (ineligible and not in study)
Challenge 5: Program is an entitlementCannot force nor deny intervention
Presenter
Presentation Notes
2 min The Supplemental Nutrition Assistance Program, or “SNAP,” or what most people know it as, “Food stamps” is a program available to any individual or household below the poverty line. That can’t be taken away. So if we wanted to know the impact of food stamps on nutrition, how might we go about evaluating that?
Challenge 5: Program is an entitlement
Treatment Group Control Group
Presenter
Presentation Notes
30 seconds So how do we have a treatment and a control group? We cannot deny foodstamps to individuals in the control group.
Solution 5: Encouragement
Treatment Group Control Group
J-PAL | HOW TO RANDOMIZE 86
Presenter
Presentation Notes
1 min In New Jersey as part of the Hotspotting program, the nurses help individuals enroll….
Solution 5: Encouragement
Treatment Group Control Group
3/4ths take-up 1/4th take-up
J-PAL | HOW TO RANDOMIZE 87
Presenter
Presentation Notes
30 seconds Here you see that 3/4ths enrolled in food stamps in the treatment group And 1/4th enrolled in the control group. Now how do you measure impact?
To evaluate the effect of this program, you would first:A. Compare those who
enrolled to those who didn’t
B. Drop those who didn’t enroll from the treatment group
C. Drop those who did enroll from the control group
D. Both B&CE. Compare treatment
group to entire control group
J-PAL | HOW TO RANDOMIZE 88A. B. C. D. E.
0% 0%
67%
33%
0%
Presenter
Presentation Notes
3 min
Solution 5: Encouragement
Treatment Group Control Group
3/4ths take-up 1/4th take-up
Entire Treatment Group Entire Control GroupCompare
toJ-PAL | HOW TO RANDOMIZE 89
Presenter
Presentation Notes
1 min In this case, you would compare the entire treatment group to the entire control group. And in a sense, you’d be evaluating the impact of “encouraging” people to take up food stamps. If you detect an impact, AND IF IT’S very unlikely this would be because of the encouragement alone, driving this impact would the impact of food stamps directly.
Problem 6: Sample size is small
J-PAL | HOW TO RANDOMIZE 90
Presenter
Presentation Notes
30 seconds Say for example, we have this randomization design. We have 400 people, but we’re randomly assigning to only 12 schools. This could affect the power of your experiment. (You’ll hear from Rachel later on about why that is) If that’s the case, and if it’s feasible, you may want to consider changing the unit of randomization.
Solution 6a: Change the unit of randomization
J-PAL | HOW TO RANDOMIZE 91
Presenter
Presentation Notes
15 seconds Perhaps a sample of 24 classrooms?
How do we increase school participation (enrollment and attendance)?
A government wants to improve school attendance at primary schools, what interventions would you recommend?
J-PAL | WHAT IS EVALUATION 14
Presenter
Presentation Notes
If you were a policy maker how would you go about improving school participation?
What is the most effective intervention to increase school participation (enrollment and attendance)?A. Text Books
B. Lunch for free
C. Free school uniforms
D. Treat intestinal worms
E. Merit scholarships
F. Improve curriculum & teaching
G. Provide better materials
H. Increase awareness of returns to education
J-PAL | WHAT IS EVALUATION 15A. B. C. D. E. F. G. H.
0%
100%
0% 0%0%0%0%0%
Presenter
Presentation Notes
Now let’s ask a much more specific question.
Impact evaluations can help answer these questions
J-PAL | WHAT IS EVALUATION 16
Presenter
Presentation Notes
Impact evaluations can help answer whether programs contribute to social change, but they can also give you tools for deciding what programs to invest in in the first place if you are trying to address a certain problem with a limited amount of resources. One of the tools that J-PAL creates for these types of decision-makers is cost-effectiveness analysis, which tells you, for a given amount of money how much can you can increase student attendance, or investment in preventive health products, or micro-business profits using different kinds of programs. The above graph shows you how much additional student attendance is possible to achieve with a given program and budget constraint, be it a campaign that gives parents information on the wages their children could earn for every additional year they attend school, deworming, school meals, scholarships, subsidized uniforms and conditional cash transfers. All of these programs increased student attendance, but some were relatively more cost-effective than others. We see here that deworming has been shown to be one of the most cost-effective ways to increase children’s attendance in school, resulting in 28.6 additional years of school across the whole sample of kids who were offered deworming pills per $100 spent. Context is paramount: you would never recommend deworming if worms aren’t a problem in your context. Cost-effectiveness analysis is just one more data point or resource that can help organizations with social missions make decisions about what to invest in.
Which one of these would make a good question for an impact evaluation?A. What share of kids in
Tanzania drop out of school before completing primary?
B. Will providing kids with deworming pills or school uniforms do a better job of keeping kids in school?
C. What role does ethnicity play in student results?
J-PAL | WHAT IS EVALUATION 18A. B. C.
0%6%
94%
Which one of these would make a good question for an impact evaluation?A. Are agricultural
extension agents giving farmers the same information they were trained on?
B. What share of farmers in Kenya currently live on less than $2 a day?
C. Which kind of fertilizer works best for a plot of maize?
J-PAL | WHAT IS EVALUATION 19A. B. C.
0% 0%0%
Which one of these would make a good question for an impact evaluation?A. Does a sexual education
program or free school uniforms have a bigger effect on teenage pregnancy rates?
B. Do teenage girls have a right to have full information regarding sexual education?
C. Are teachers spreading misinformation when delivering sexual education?
J-PAL | WHAT IS EVALUATION 20A. B. C.
0% 0%0%
5 components of program evaluation
Impact Evaluation
Cost-Effectiveness Analysis
Needs Assessment
Theory of Change
Process Evaluation
Impact Evaluation
Cost Effectiveness Analysis J-PAL | WHAT IS EVALUATION 22
Presenter
Presentation Notes
Needs Assessment : What is the problem? Theory of Change: How, in theory, does the program fix the problem? �Process evaluation: Does the program work as planned? Impact evaluation: Were its goals achieved? The magnitude? Cost effectiveness: Given magnitude and cost, how does it compare to alternatives? Different components help you answer different questions
WATER, SANITATION & HEALTH
An Example
Presenter
Presentation Notes
The rest of this presentation walks through all five components of program evaluation using the concrete encased spring example from Kenya. The evaluation summary for this can be found here: https://www.povertyactionlab.org/evaluation/cleaning-springs-Kenya.
What do you think is the most cost-effective way to reduce diarrhea?A. Develop piped water
infrastructureB. Improve existing water
sources C. Increase supply of and
demand for chlorineD. Education on sanitation
and health E. Improved cooking stoves
for boiling waterF. Improve sanitation
infrastructure
J-PAL | WHAT IS EVALUATION 24A. B. C. D. E. F.
0%
6% 6%6%
35%
47%
Presenter
Presentation Notes
Now let’s ask a much more specific question.
NEEDS ASSESSMENT
Identifying the problem
Presenter
Presentation Notes
Needs assessments allow us to confirm whether or not the problem exists This is our first step in program evaluation
Needs AssessmentQuestions answered by a needs assessment
• Does the problem we proposing to solve actually exist? – What is the likely source of the problem?– Of the solutions proposed and tried, why are they failing?– Who is in most need?
J-PAL | WHAT IS EVALUATION 26
Presenter
Presentation Notes
Each section begins with questions that can be answered by a particular component of program evaluation. This is done to emphasize that different questions are answered through different assessments and that not all questions require impact evaluations to be answered.
Needs Assessment
• Does the problem exist?– Diarrheal disease killed approximately 2.6 million people a
year between 1990 and 2000 .– 20% all child deaths (under 5 years old) are from diarrhea
…..what is the likely source?
J-PAL | WHAT IS EVALUATION 27
Presenter
Presentation Notes
This section examines a particular problem, that of diarrheal disease, and then runs through potential causes and solutions. The figures given above are relevant for when the evaluation was done.
The source of the problem?
J-PAL | WHAT IS EVALUATION 28
Presenter
Presentation Notes
One potential cause of this may be bad water. ( 13% of the population lack access to clean water) ** This picture shows a young boy collecting water at a naturally occurring spring. -- As you can see, some wood has been placed around the eye of this spring, but the water pools at the collection point where it can easily be contaminated with surface water run-off. In an agricultural area with incomplete sanitation coverage, this makes it easy for fecal matter (from either humans or livestock) to contaminate the collected water. -- You can also imagine in this picture how contamination in transport and storage might occur. Children sometimes collect water and can easily touch it in open containers. If this kid here has fecal matter on his hands and makes contact with the spring water (which is likely), he could easily contaminate it. Similar things can happen within the home. When water is scooped out of the top of storage containers with a dipper, it is hard to avoid touching the water.
Theory of Change
Blueprint for Change
Presenter
Presentation Notes
What is the theory behind your solution? How does that map to your theory of the problem? Many terms used for the theory of change. Explain that we will briefly touch on one model, but that there will be a whole other lecture dedicated to the theory of change tomorrow .
Theory of Change Questions answered by a theory of change
• How will the program address the needs put forth in your needs assessment?– What are the prerequisites to meet the needs?– How and why are those requirements currently lacking or
failing?– How does the program intend to target or circumvent
shortcomings? – What services will be offered?
J-PAL | WHAT IS EVALUATION 30
What is a potential solution to this problem?
31J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
One potential solution could be encased springs. This prevents contamination from the ground water.
Alternative Solution(s)?
32J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
Latrines Information campaigns Piped water
Log FrameObjectives Hierarchy
Indicators Sources of Verification
Assumptions / Threats
Impact(Goal/ Overall
objective)
Lower rates of diarrhea
Rates of diarrhea
Household survey
Waterbornedisease is primarycause of diarrhea
Outcome(Project
Objective)
Households drink cleaner water
(Δ in) drinking water source;E. coli CFU/100ml
Household survey, water quality test at home storage
Shift away from dirty sources. No recontamination
Outputs Source water is cleaner; Families collect cleaner water
E. coli CFU/100ml;
Water qualitytest at source
continued maintenance, knowledge of maintenance practices
Inputs(Activities)
Source protection is built
Protection is present, functional
Source visits/ surveys
Sufficient materials, funding, manpower
Source: Roduner, Schlappi (2008) Logical Framework Approach and Outcome Mapping, A construct ive Attempt of Synthesis
Needs assessment
Process evaluation
Impactevaluation
35J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
Thus can also be represented in a log frame. * Do not go into the detail, but use this to describe the difference between impact and process evaluations.
PROCESS EVALUATION
Making the program work
Process Evaluation Questions answered by a process evaluation
• Was the program carried out as planned?– Are basic tasks being completed?– Is the intervention reaching the target population?– Is the intervention being completed well or efficiently and
to the beneficiaries’ satisfaction?
J-PAL | WHAT IS EVALUATION 37
Presenter
Presentation Notes
Are basic tasks being completed? Was the encased water spring constructed? Was it maintained? Is the intervention reaching the target population? Is the intervention being completed well or efficiently and to the beneficiaries’ satisfaction? Do households collect water from improved source? Does storage become re-contaminated? Do people drink from “clean” water?
IMPACT EVALUATION
Measuring how well it worked
Impact Evaluation Questions answered by impact evaluations
• Process evaluations determine if a program is running in the way it is supposed to run
• Impact evaluations determines if a program creates a change in an outcome(s)– Did concrete encased springs decrease diarrhea rates?
40J-PAL | WHAT IS EVALUATION
What was the impact?
• 66% reduction in source water e coli concentration• 24% reduction in household E coli concentration• 25% reduction in incidence of diarrhea
41J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
Could we get images to shows these?
Making Policy from Evidence
Intervention Impact on DiarrheaSpring protection (Kenya) 25% reduction in diarrhea
incidence for ages 0-3
J-PAL | WHAT IS EVALUATION 42
Making Policy from Evidence
Intervention Impact on DiarrheaSpring protection (Kenya) 25% reduction in diarrhea
incidence for ages 0-3Source chlorine dispensers(Kenya)
20-40% reduction in diarrhea
Home chlorine distribution (Kenya)
20-40% reduction in diarrhea
Hand-washing (Pakistan) 53% drop in diarrhea incidence for children under15 years old
Piped water in (Urban Morocco)
0.27 fewer days of diarrhea per child per week
J-PAL | WHAT IS EVALUATION 43
Presenter
Presentation Notes
So what intervention should we invest in? Three big issues here: Inconsistent outcome measures Different contexts Cost!
COST-EFFECTIVENESS ANALYSIS
Evidence-Based Policymaking
Cost-Effectiveness Diagram
45J-PAL | WHAT IS EVALUATION
Presenter
Presentation Notes
Now we’ve gone through the lifecycle of a program evaluation (or several)
Start time: 1:44 1 min (WHOLE SECTION SHOULD TAKE ABOUT 7 MIN) So far we used the female policymaker example to demonstrate a few key points (1) Before considering what to measure, always start with the theory of change. Ideally, the theory of change will dictate what intervention is being considering, who it will impact, and on what key outcomes But beyond that, it allows us to think of how to measure all the intermediate steps, the processes, the mechanisms and even the assumptions (2) There are many potential sources of measurement (3) I wanted you to see the results of the study So now let’s generalize
First-order questions in measurement
• What data do you collect?• Where do you get it?• When do you get it?
J-PAL | M EASUREMENT & I NDICATORS 14
Presenter
Presentation Notes
30 seconds The first question you should ask is What do you want to measure? That we discussed above. For the most part, you want that to be informed by your theory of change The second question; where do you get it? There are many possible sources of data: survey data, administrative data, which we’ll go into a bit more detail about. Last: when do you want these measures? Is it okay to do all of the data collection at the end? Or should you always have a baseline? Or should you be collecting data all throughout the process? We covered the first question in detail. Now let’s focus on the second
Where can we get data?
• Obtained from other sources– Publically available
– Administrative data
– Other secondary data
• Collected by researchers– Primary data
J-PAL | M EASUREMENT & I NDICATORS 15ht t ps://commons.wikimedia.org/w iki/File:Cuyahoga_County_US_Census_Form-Herbert _Birch_Kingston_1920. jpght t ps ://commons.wikimedia.org/w iki/File:US_Navy_090123-N-9760Z-004_Hospit al_Corpsman_2nd_Class_Jennifer_Ross_files_medical_records_aboard_t he_aircraft _carrier_U SS_Nimitz_(CVN_68). jpg
Presenter
Presentation Notes
1 min Where can we get the data? Click 1: Historically, most empirical work at least in economics has been conducted using publicly available dataset. Researchers downloaded datasets from statistics bureaus, census bureaus, sample surveys run by large governments or international agencies, and then run regressions, often at the country level. Click 2: “Administrative Data can also be used: data that are collected by departments or companies for internal use, not necessarily for the purpose of research. For example, tax records. Or in the case of our last example, village meeting notes. Economists who study health get data directly from hospitals or insurance companies. Click 3: In the past few decades social scientists have scaled up their own efforts to collect their own data. In Economics, Angus Deaton recently won a Nobel prize for his pioneering work in this area
Types and Sources of DataInformation about a person/ household / possessions
NOT about a person/ household / possessions
Information provided by a person
Automaticallygenerated
J-PAL | M EASUREMENT & I NDICATORS 16
Presenter
Presentation Notes
1 min Click 1: Usually in social sciences, or social programs, when we collect data, it’s about people. And the people often know they’re providing information to someone, somehow. Whether they’re answering a survey question, taking an exam, submitting tax returns, they know the information is being collected. Click 2: Sometimes it’s not so obvious, like buying something at the store, filing a police report, data collected online, in public spaces, etc. They often know there is “personal” data being collected. They may have some conceptions or misconceptions about their level of privacy… Click 3: Sometimes we collect data not about a person, but about, for example, rainfall, pollution emissions, etc. But a person may still be involved in collecting the data. Click 4: In other cases, we use sensors and never have to interact with a person
1 min Most of the work we do in the developing countries involves collecting primary data on people – in a wide variety of different forms This is considerably less true in the US and high-income countries where secondary sources of data tend to be much more extensive and high quality - We’ll talk a lot about surveys, but these are certainly not the only type of data you could consider collecting – list others
1 min Surveys themselves come in many different forms Increasingly researchers are taking advantage of the technological possibilities of computer assisted or digital surveying Both for surveys that are administered by an interviewer and those that are self administered Digital surveying opens up the possibility of questionnaires that are tailored to the respondent and the responses they have given so far in the survey
When to collect data
• Baseline• During the intervention
– Process, Monitoring of intervention
• Endline• Follow-up• Scale-up• Intervention: M&E
J-PAL | M EASUREMENT & I NDICATORS 20
Presenter
Presentation Notes
1 min Define each Point out that endline / follow up is most important but baseline can help with (a) het TE and (b) directionality (i.e. everyone getting worse vs everyone getting better)
Ethics
• “Experimenting on people”• Belmont Principles
– Respect for persons– Beneficence– Justice
• Institutional Review Boards (IRBs)
J-PAL | M EASUREMENT & I NDICATORS 21
Presenter
Presentation Notes
3 min Click 1: A common critique we hear is that by doing randomized evaluations of social programs, we’re “experimenting on people” often without their consent. One important distinction is that people have an image of a scientist in a lab with a cage full of guinea pigs. And injecting things into them to see what happens. The problem with that image is that the injection is what upsets people. But the injection is a representation of the intervention: the social program. And most of the time in our kind of research, the evaluators are not delivering that injection. It’s the policymakers who are. What we’re trying to do is figure out a way to learn from that injection. How do we do that? We try to collect data on those who are receiving (or not) the intervention. Click 2: After some really questionable practices, in research (namely the Tuskegee Study), the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research produced the Belmont Report in 1978, which lays out the principles that govern the kind of research we do Click 3: The first principle is “Respect for persons”. So while the government doesn’t ask each individual affected by a new policy or program for their consent on whether to implement that policy. So if the government wants to test the effectiveness of computers in schools, it doesn’t ask parents whether it’s okay to have computers in the school. And even if allocation of the program is randomized, for the purpose of research, the government won’t ask for people’s consent. BUT, it is typically the researcher’s duty to ask them whether they are willing to provide information to researchers. So we always ask for INFORMED CONSENT before surveying them. They also should be aware of their right to not participate, or to withdraw participation at any time, for any question Click 4: The principle of Beneficence is that the value of the research must be worth the cost and risk to participants Click 5: the principle of Justice is that the people involved in the study must represent those likely to benefit (No medical testing on prisoners) Click 6: The Belmont report also formalizes a role for IRBs to review and approve any research
How to MeasureConcept
Presenter
Presentation Notes
Start time: 1:51 15 seconds Now that we have a sense of the types of measurement on the table, let’s zoom way out, and think about what measurement is
2 min Let’s say we’re running an early childhood development experiment Click 1: Intelligence is a key outcome indicator How would we go about measuring that? Click 2: First we need to acknowledge that intelligence is what we’d call “a construct” It doesn’t necessarily have a precise scientific definition. Click 3: we often use IQ tests to measure intelligence. We often distill a construct into a question, or series of questions. Many people take issue with IQ tests. Some believe they’re culturally biased. Some believe they arbitrarily weight certain cognitive abilities over others. I think at best, IQ is a “proxy” for intelligence. Click 4: Here the total exam score may be our indicator. Again, our indicator is a “proxy” for our construct. Sometimes our indicator is an exam, or just a single exam question (to measure a competency). It can be a survey question, or collection of questions to produce an index, it can be a medical test. It can be household profit or consumption. But here let’s stick to our IQ test Click 5: And then we take our indicator to our target population, Here we administer the IQ test to kids Click 6: And collect data Note, even though it’s the construct we’re interested in, at the end of the day, we’re really only able to analyze data. So the extent to which our data reflect the construct depends on how good a proxy the indicator is, and how successfully we collect data. If for some kids we administered the test first thing in the morning in quiet, calming rooms. And for others, if we administered the test at the end of the day, where there’s a lot of street noise and other distraction, the data we collect may not accurately reflect the TRUE IQ score of the kids.
1 min. Take another example: Stress Again, stress, like intelligence is something we understand vaguely, but if someone asked me what the precise definition was, I might be lost. Click 1: Neurophysiologists suggest cortisol levels are a direct manifestation (or potentially cause) of what we consider stress levels. Click 2: However, the saliva tests we use to collect it are highly sensitive to outside factors, like when in the day the test is done. So while the indicator may be really close to what we care about. The data may be all over the place Alternatively, researchers could just ask me a question like: on a scale of 1-10, how stressed are you? The result may not be as all over the place, but I may tailor my response depending on who’s asking. If the senior faculty in my department were asking, I might say VERY. If my child was asking, I might want them to believe I’m not stressed at all. And with surveyors and respondents, it often depends on the power dynamic, and what the respondents think potential outcomes of the research may be.
The goals of measurement
J-PAL | M EASUREMENT & I NDICATORS 30
Accuracy
Unbiasedness
Validity
• Precision
• Reliability
Presenter
Presentation Notes
2 min So what are the major challenges? Click 1: Our first concern is accuracy: We may know that the IQ test is a somewhat biased proxy for intelligence. For example, you can study to do better. Or many of the questions may be identical to puzzles more wealthy kids have access to. And we may know there are many factors that influence a kid’s score on the IQ test beyond intelligence. Ability to study for it, time of day, hunger, stress, etc. But it may be the best we’ve got. Click 2: Cortisol may be an example of an unbiased measure of stress. But the test is still very noisy. When we think about the mapping of an indicator on to a construct, we will refer to this as the validity of the measure – how centered is it over the bullseye? Click 3: So what about height and weight? Those are generally pretty easy to measure, tend to be pretty accurate and precise. Click 4: But sometimes we put them together to produce a body-mass index which itself can be pretty precise, but when we use the BMI to measure nourishment, we may question its accuracy. In other words, BMI is a questionable indicator for the construct of “level of nutrition”. It doesn’t take into account muscle mass versus body fat. A high BMI could reflect stunting (being shorter than expected), obesity, or just being buff. When we are thinking about the relationship between an indicator and a construct, we will call this reliability.
Validity
• In theory: – How well does the indicator map to the outcome?
(e.g. IQ tests intelligence)
J-PAL | M EASUREMENT & I NDICATORS 31
Construct
Indicators
Validity
Presenter
Presentation Notes
15 seconds So summing up: Validity is how well our indicator maps to the construct
Reliability
• In theory:– The measure is consistent and precise vs. “noisy”
J-PAL | M EASUREMENT & I NDICATORS 32
Construct
Indicators
Data Collection(“Response”)
Reliability
Presenter
Presentation Notes
15 seconds And reliability is the extent to which the data we collect are similar/consistent each time we measure it (regardless of who is doing the measurement)
The Response Process
J-PAL | M EASUREMENT & I NDICATORS 37
Indicators
Data Collection(“Response”)
Measurement Error
Data
Presenter
Presentation Notes
Start time: 2:08 15 seconds. The response process takes us from the indicator to the data. Click 1: Also we’ll take a peek at how measurement error can creep in
4-step Response Process1.
Comprehension of the question
2. Retrieval ofInformation
3. Judgementand Estimation
4. Reporting anAnswer
J-PAL | M EASUREMENT & I NDICATORS 38
Presenter
Presentation Notes
1 min: Whenever we require the respondent to think, it’s useful putting ourselves into the head of our respondent. We’ll go into more detail in the next few slides Click 1: When a surveyor asks a question, or a respondent reads a question in a form, they have to understand what the question is asking Click 2: Once the respondent has understood the question, they now have to retrieve the necessary information from their brain. Click 3: At this point they may have retrieved many of the necessary facts. Perhaps the answers given are: today, yesterday, before then. Perhaps it’s January, February, March, April… Perhaps it’s a date field. And now they have to place the memory to a particular date Click 4: Finally they need to map their memory and calculations to the answers given. Having gone through all of this, they may be hesitant to tell me.
Measurement Error: Vagueness
Vague concepts where respondents may interpret the question in different ways.
Example:
Q. Do you live with a teenager?• Yes• No
Between what age ranges is a teenager?
Make sure to define vague conceptsJ-PAL | M EASUREMENT & I NDICATORS 51
Presenter
Presentation Notes
30 seconds
Measurement Error: Completeness
The response categories do not include all categories that can be expected as a response
Example:
Q. What is the highest level of education completed?• Basic Education (1-5th)• Middle School (6th-8th)• High School (9th-12th)• College Degree• Post Graduate • Other Professional Degree (e.g. Medical, Law, Teacher)
“No education” or “vocational degree” is not a response
Pilot question to make sure that categories are exhaustiveJ-PAL | M EASUREMENT & I NDICATORS 53
Presenter
Presentation Notes
Here, we alter the question slightly to get around the issue of comprehension. And we’ve clarified the definition of But we’re still missing some important categories.
Measurement Error: Negatives
Questions that include negatives can be confusing to the respondent and lead to misinterpretations.
Example:
Q. Do you think that you should not let your children play contact sports?• Yes• No
Having a negative might throw some people off
Avoid unnecessary negativesJ-PAL | M EASUREMENT & I NDICATORS 55
Presenter
Presentation Notes
30 sec
Measurement Error: Overlapping CategoriesThe categories overlap each other.
Example:
Q. How many hours a day do you work?• Less than an hour• Between one and four hours• Between three and eight hours• Between eight and ten hours• More than ten hours
What would a person who works eight hours a day reply?
Make sure that all categories are mutually exclusiveJ-PAL | M EASUREMENT & I NDICATORS 57
Presenter
Presentation Notes
30 sec
Measurement Error: Presumptions
The question assumes certain things about the respondent
Example:
Q. How would you rate the quality of coffee this morning?• Very good• Somewhat good• Not good
We are assuming that the respondent drank the coffee
Use filters and skip patternsJ-PAL | M EASUREMENT & I NDICATORS 59
Presenter
Presentation Notes
30 sec Again, this would be a problem at the response step. The best practice is to use
Measurement Error: Framing effect
People react to a particular choice in different ways depending on how it is presented i.e. prefer gains over losses
Example:
Q. Two new treatments have been developed to treat 600 terminally ill patients. Treatment A will save 200 people, while Treatment B will allow 400 people to die. Which treatment would you prefer? • Treatment A• Treatment B
Treatment A is preferable because it has been framed as a gain
Try to be neutral when framing questionsJ-PAL | M EASUREMENT & I NDICATORS 61
Presenter
Presentation Notes
30 seconds
Measurement Error: Recall Bias
People may retrieve recollections regarding events or experiences differently
Example:
Q. How long did you have to wait last time you voted?• No time (there was no line, or I voted by mail)• Less than 10 minutes• Between 10 minutes and 30• More than 30 minutes but less than an hour• An hour or more
This experience may be more vivid for some respondents than others.
You can ask respondents to keep a diary or save their receiptsJ-PAL | M EASUREMENT & I NDICATORS 63
Presenter
Presentation Notes
1 min Recall bias is really comes in at the estimation process. How this could bias your response is, for example, those in states who voted more recently may have a clearer memory. Or those whose candidate lost the election may be more likely to remember having to wait longer than those whose candidate won. Simply because they’re angry that they had to wait, and may feel that their candidate’s vote was surpressed.
Measurement Error: Anchoring Bias
People tend to rely too heavily on the first piece of information seen
Example:
Q. In Arizona, some voters reported having to wait more than 5 hours to vote. How long did you have to wait last time you voted?• No time (there was no line, or I voted by mail)• Less than 10 minutes• Between 10 minutes and 30• More than 30 minutes but less than an hour• An hour or moreRespondents will be more likely to give a number on the higher end of the spectrum
Avoid adding anchors to your questionsJ-PAL | M EASUREMENT & I NDICATORS 65
Presenter
Presentation Notes
1 min “Avoiding anchors” may seem obvious. But some times these anchors creep in through the ordering of questions. If you ask a respondent a question that elicits a response with a large number. It’s shown that in many cases the next question will be biased upwards.
Measurement Error: Telescoping Bias
People perceive recent events as being more remote than they are (backward telescoping) and distant events as being more recent than they are (forward telescoping)
Example:
Q. Did you purchase a TV or other electronic (worth over $500) in the past 12 months?____________ emails
This will lead to over reporting due to forward telescoping of events that happened before 12 months ago
Visit once at the beginning of the reference period. Then ask, “since the last time I v isited you, have you…?”
J-PAL | M EASUREMENT & I NDICATORS 67
Presenter
Presentation Notes
This is usually a problem with what we call “lumpy purchases” or “investments”. If you purchased something large a little over a year ago, you may feel like you’re doing a disservice to the survey by Excluding it. You assume the surveyor cares more about whether you purchased something large, than the specific timeframe. But that causes problems for the survey. If say, 100% of respondents make a $500 purchase every OTHER year. But all feel compelled to include it when responding about purchases in the LAST year, then as evaluators, we’ll over estimate the number of large purchases by 100%. One way of dealing with this is by visiting once at the beginning of the reference period, and once at the end.
Measurement Error: Social Desirability BiasTendency of respondents to answer questions in a manner that is favorable to others i.e. emphasize strengths, hide flaws, or avoid stigma
Example:
Q. Do you beat your wife?• Yes• No
Respondents would be shy to admit to such behavior
Ask indirectly, ensure privacyJ-PAL | M EASUREMENT & I NDICATORS 69
Key Steps in conducting an experiment
1. Design the study carefully
2. Randomly assign people to treatment or control
3. Collect baseline data
4. Verify that assignment looks random
5. Monitor process so that integrity of experiments is not
compromised
J-PAL | WHY RANDOMIZE 30
Presenter
Presentation Notes
These 8 steps present a very simplified description of the process. Idea is to give a complete picture on how this works in a typical experiment.
Key Steps in conducting an experiment (contd.)
6. Collect follow-up data for both the treatment and
control groups
7. Estimate program impacts by comparing mean
outcomes of treatment group vs mean outcomes of the
control group
8. Assess whether program impacts are statistically
significant and practically significant
J-PAL | WHY RANDOMIZE 31
Presenter
Presentation Notes
These 8 steps present a very simplified description of the process. Idea is to give a complete picture on how this works in a typical experiment.
J-PAL | THREATS AND ANALYSIS
• Random assignment of subjects to treatments– receiving treatment statistically independent of subjects’
potential outcomes
• Non-interference: subject’s potential outcomes reflect only whether they receive the treatment themselves
– Subject’s potential outcomes unaffected by how treatments happened to be allocated
• Excludability: subject’s potential outcomes respond only to defined treatment, not other extraneous factors that may be correlated with treatment
– Importance of defining treatment precisely and maintaining symmetry between treatment and control groups (e.g., through blinding)
Core assumptions
11
J-PAL | THREATS AND ANALYSIS
Noncompliance
• Sometimes there is a disjunction between the treatment that is assigned and the treatment that is received– Miscommunication and administrative mishaps– Subjects may be unreachable– Encouragements sometimes don’t work
• Addressing noncompliance requires careful attention to “excludability” assumptions – Are outcomes affected only by the treatment? Or by
both the assignment and the treatment?
16
Treatment groupParticipants
No-Shows
Control groupNon-
Participants
Crossovers
Random Assignment
Bad idea: biased
What can you do?Can you switch them?
J-PAL | THREATS AND ANALYSIS
Handling noncompliance
17
Treatment groupParticipants
No-Shows
Control groupNon-
Participants
Crossovers
Random Assignment
J-PAL | THREATS AND ANALYSIS
Handling noncompliance
Bad idea: biased
What can you do?Can you drop them?
18
Treatment groupParticipants
No-Shows
Control groupNon-
Participants
Crossovers
Random Assignment
Inferences should be based solely on comparisons of randomly assigned groups
J-PAL | THREATS AND ANALYSIS
Handling noncompliance
19
J-PAL | THREATS AND ANALYSIS
Noncompliance: avoiding common errors
• Subjects you fail to treat are NOT part of the control group!
• Do not throw out subjects who fail to comply with their assigned treatment
• Base your estimation strategy on the ORIGINAL treatment and control groups, which were randomly assigned and therefore have comparable potential outcomes
20
Promise of experiments:
Surprisingly positive results
o (Miguel/Kremer 2004) showed that deworming treatment (costs 49 cents/child per year) can reduce abesenteeims from by school by one-quarter
o In terms of increasing attendance – deworming is 20 times as effective as hiring an extra teacher, even though both work in the sense of generating statistically significant improvements
o Economic intuition would not have helped us come to this conclusion
o NGOs were equally uniformed about this comparison
sk children around the world why they are not in school and you will get many answers: cost, distance, lack of facilities. Very few of
them will mention worms—soil-transmitted hel-minths (STHs) and schistosomes. Until recently few experts would have mentioned worms as a key barrier to schooling either.
Four hundred million children of school-age are chronically infected with intestinal worms. In-fected children suffer listlessness, diarrhea, ab-dominal pain and anemia. These parasites are so widespread that some societies do not recognize infection as a medical problem. Symptoms of worms, such as blood in the stool, are considered a natural part of growing up. So even though safe, cheap, and effective oral medication that can kill 99 percent of worms in the body is available and the World Health Organization (WHO) recom-mends mass deworming of school-aged children, only 10 percent of at-risk children get treated.
OCTOBER 2007
Policy Briefcase No. 4
Abdul Latif Jameel Poverty Action LabMIT Department of Economics
Mass Deworming: A Best-Buy for Education and Health
For more details on this study
see Miguel and Kremer (2004)
and Kremer and Miguel (2007)
available at
www.povertyactionlab.org
A This Briefcase (based on Miguel and Kremer, 2004; and Kremer and Miguel, 2007) reports the results of a randomized impact evaluation of a de-worming program in western Kenya. The results show that school-based mass deworming—where every child in a school is treated—is the most cost-effective way to increase school participation (of all the alternatives that have been rigorously evaluated). It is also one of the most cost-effective ways to improve health that we know of.
Similar educational benefits were found when intestinal worms were eradicated from the southern states of the U.S. in 1915 (Bleakley, 2007). Follow-up work found that attempts to make the program self-sustaining—through health education and user fees—led to its col-lapse. Only long-term funding of a school-based program sustained the benefits.
Summary
What was done About 30,000 children in 75 primary schools in rural Kenya were treated en masse in schools with drugs for hookworm, whipworm, roundworm, and schistosomiasis (bilharzia).
Key Impacts Reduced the incidence of moderate-to-heavy infections by 25 percentage points.
Reduced school absenteeism by 25 percent, with the largest gains among the youngest pupils.
School participation in the area increased by at least 0.14 years of schooling per treated child.
There was no evidence that deworming increased test scores.
Cost Effectiveness
Cost: 50 cents per child per year
Health: US$5 for every Disability Adjusted Life Year (DALY) saved
Education: US$3.50 for each additional year of school participation
Take Action Nowwww.dewormtheworld.org
siwan
Highlight
11-07-20 1:08 AMOur Story | dewormtheworld
Page 1 of 1http://www.dewormtheworld.org/?q=node/68
Search
Home » About Us » Our Story
Our Story
Over 400 million children are infected with parasitic worms. Although the harm they cause to children’s health and educationhas been recognized since the 1980s, deworming was not widespread due to more urgent health sector priorities. However,over two decades later, new groundbreaking research changed how the education sector viewed school-based deworming.
There were three key findings. First, researchers showed that the health impacts of deworming were significantly greater thanpreviously estimated, due to the spillover effects of treatment. Second, they illustrated that mass deworming drastically improvedschool participation. In fact, it is one of the best returns on investment of any intervention evaluated to increase school attendance.Finally, they conclusively demonstrated that deworming through schools is an efficient and effective way to treat large numbers ofchildren.
Investigators have also since followed up on this research to show the long run impacts of deworming, which result in increasedearnings and workforce participation of adults who received two to three additional years of treatment during school.
This evidence was a breakthrough. School-based deworming was globally recognized as a ‘best buy’ for development, and thebenefits and cost-effectiveness of school-based deworming were now clear to both the health and education sectors. However,additional barriers remained, and millions of children continued to go without treatment. Some countries needed access to drugs, whileothers needed technical assistance and capacity building. In addition, policies needed to be developed or strengthened in order tosupport school-based deworming programs.
Recognizing the huge opportunity to impact the lives of millions of children, economists Michael Kremer and Esther Duflo shared theevidence with fellow members of the Young Global Leaders Education Task Force, who promptly launched the Deworm the WorldInitiative in January 2007 at the World Economic Forum Annual Meeting in Davos, Switzerland.
The Deworm the World Initiative is operated as a partnership between Innovations for Poverty Action and Partnership for ChildDevelopment. Working together, the Initiative has reached 20 million children in 27 countries by supporting the launch of newcountry programs and enabling the continued activity of existing ones.
www.dewormtheworld.org Disclaimer Sitemap Designed By SunGard Copyright @ 2011
SUPPORT US
HOME ABOUT US WHY DEWORM OUR WORK FOR IMPLEMENTERS GET INVOLVED NEWS RESOURCES
o Duflo, Kremer, Robinson (2010) reflects an iterative process
o succession of experiments on fertilizer use were run over a period of several years
o each set of results prompting the need to try out a series of new variation in order to better understand results of previous one
Theoretical Motivation
o Experiments designed to assess whether there is a demand for commitment products (Ashraf, Karlan, and Yin 2006) – came from theoretical motivation
o Karlan and others – experiments emerging as powerful too for testing theories
Biggest Advantage:
Experiments may be that they take us into terrain where observational approaches are not available
Objections raised by critics best viewed as warnings against over-interpreting experimental results
Also concerns about what experiments are doing to development economics as a field
Generalizability
Environmental Dependence - Core element of generalizability – would the same result occur in a different setting?
Effect is not constant across individuals – likely vary systematically with covariates?
Concern of implementer effects and compliance – smaller organization (NGO) – estimated treatment effect reflects unique characteristics of implementer
e.g. some NGOs refuse to randomize
Randomization Issues
Fact that there is an experiment going on might generate selection effects that would not arise in non-experimental setting (being part of an experiment and being monitored influences participants)
Villagers not used to private organization going around offering them things
Necessary that individuals are not aware that they are excluded from program (difficult when randomization is at individual level, easier if randomization is at village level)
Equilibrium Effects
Program effects from small study may not generalize when program is scaled up
e.g. :
Vouchers to go to private school
Students end up with better education and higher incomes
Scale up program to national level
Crowding in private schools (collapse of public schools)
Returns to education fall because of increased supply
Experimental evidence overstates returns to vouchers program
Notes from: “Instruments, Randomization, and Learning about Development” (Deaton 2010)
Effectiveness of development assistance is topic of great public interest
Much public debate among non-economists takes it for granted that, if the funds were made available, poverty would be eliminated -- Amongst economists, it is mixed.
Macro perspective: can foreign assistance raise growth and eliminate poverty?
Micro perspective: what sorts of projects are likely to be effective? Should aid focus on roads, electricity, schools, health clinics?
Answer – we don’t know – how should we go about finding out?
Frustration with Aid organizations
Particularly the World Bank
Allegedly failing to learn from its projects and to build up a systematic catalogue of what works and what does not
Movement toward randomized controlled experiments:
Esther Duflo:
“ randomized trials can revolutionize social policy during 21st century just as they revolutionized medicine during the 20th”
---- Lancet editorial headed “ The World Bank is finally embracing science”
Deaton argues:
under ideal circumstances randomized evaluations of projects are useful for obtaining convincing estimates of the average treatment effect of a program or project
This focus is too narrow and too local to tell us “what works” in development and to design policy or to advance scientific knowledge about development processes
Argues that work needs to be refocused – not answer which projects work but why
Bigger question:
RCTs allow investigator to induce variation that might not arise nonexperimentally – but are these the relevant ones?
RCTs of “what works”
even when done without error of contamination
unlikely to be helpful for policy or move beyond the local
unless they tell us something about why
RCTs are not targeted or suited to these questions
Actual policy will always be different than experiments:
General equilibrium effects that operate on large scale
Outcomes are different when everyone is covered by treatment rather than a few
Experimental subjects are not representative of population
Small development projects at village level do not attract attention of corrupt politicians
Scientists or experimentalists more careful than government implementers
Transporting successful experiments?
Mexico’s PROGRESA program
Conditional cash transfer program paid to parents if children attend schools and clinics
Now in 30 other countries
Is this a good thing?
Cannot simply be exported if countries have
Pre-existing anti-poverty programs with conditional transfers
No capacity to meet increased demands of education and health care
No political support
Combination of mechanism and context that makes for scientific progress
Much interest in RCTs, and instrumental variables, and other econometric techniques that mimic random allocation
comes from skepticism of economic theory
impatience with its ability to deliver structures that seem helpful in interpreting reality
Internal versus external validity:
Contrast between the rigor applied to establish internal validity and the looser analysis to render it policy relevant
To do this typically use some theory or some other information from observables – both go against simplicity of RCTs
Applied and theoretical economists have never been so far apart
Failure to reintegrate is not an option
Otherwise no chance of long term scientific progress extending from the RCTs.
RCTs that are not theoretically guided are unlikely to have more than local validity
14-10-15 1:09 PMPre-analysis plans at Berkeley's BITSS conference — Running Randomized Evaluations: A Practical Guide
Page 1 of 5http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference
RUN N IN G RAN DOMIZED EVALUAT ION S: A PRACTICAL GUIDE
On December 12th I attended the annualmeeting of the Berkeley Initiative forTransparency in the Social Sciences (BITSS).BITSS brings together economists, politicalscientists, biostatisticians, and psychologists tothink through how to improve the norms andincentives to promote transparency in thesocial sciences. I was on a panel talking about
14-10-15 1:09 PMPre-analysis plans at Berkeley's BITSS conference — Running Randomized Evaluations: A Practical Guide
Page 2 of 5http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference
preanalysis plans in which researchers specifyin advance how they will analyze their data.
I have now been involved in writing four ofthese plans and my thinking about them hasevolved, as has the sophistication of the plans.Kate Casey, Ted Miguel and I first wrote one ofthese plans for our evaluation of a CommunityDriven Development program in Sierra Leone(see the previous blog ). It was exactly the typeof evaluation where pre-analysis plans aremost useful. We had a large number ofoutcome variables with no obvious hierarchyof which ones were most important so wespecified how all the outcomes would begrouped into families and tested as a group.While the outcomes were complex therandomization design was simple (onetreatment, one comparison group).
The next case also included multidimentionaloutcomes: empowerment of adolescent girls inBangladesh. However, now we had fivetreatments and a comparison group withdifferent treatments targeted at different ages.The task of prespecifying was overwhelmingand we made mistakes. It was extremelydifficult to think through in advance whatsubsequent analysis would make sense forevery combination of results we might getfrom the different arms. We also failed to takeinto account that some of our outcomes in agiven group were clearly more important thanothers: we ended up with strong effects onyears of schooling and math and literacyscores but the overall “education” effect wasweakened by no or negative effects onindicators like how often a girl read amagazine. We hope, when we write the paperpeople will agree it makes sense to deviate
14-10-15 1:09 PMPre-analysis plans at Berkeley's BITSS conference — Running Randomized Evaluations: A Practical Guide
Page 2 of 5http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference
preanalysis plans in which researchers specifyin advance how they will analyze their data.
I have now been involved in writing four ofthese plans and my thinking about them hasevolved, as has the sophistication of the plans.Kate Casey, Ted Miguel and I first wrote one ofthese plans for our evaluation of a CommunityDriven Development program in Sierra Leone(see the previous blog ). It was exactly the typeof evaluation where pre-analysis plans aremost useful. We had a large number ofoutcome variables with no obvious hierarchyof which ones were most important so wespecified how all the outcomes would begrouped into families and tested as a group.While the outcomes were complex therandomization design was simple (onetreatment, one comparison group).
The next case also included multidimentionaloutcomes: empowerment of adolescent girls inBangladesh. However, now we had fivetreatments and a comparison group withdifferent treatments targeted at different ages.The task of prespecifying was overwhelmingand we made mistakes. It was extremelydifficult to think through in advance whatsubsequent analysis would make sense forevery combination of results we might getfrom the different arms. We also failed to takeinto account that some of our outcomes in agiven group were clearly more important thanothers: we ended up with strong effects onyears of schooling and math and literacyscores but the overall “education” effect wasweakened by no or negative effects onindicators like how often a girl read amagazine. We hope, when we write the paperpeople will agree it makes sense to deviate
The millennium development goal calls for a universal primary education by 2015 little consensus on how to achieve this goal or how much it
would cost
12
One view attracting additional children to school will be difficult since
most children not in school in developing countries are earning income their families need
Another view potential contribution of children of primary school age to family
income is very small hence modest incentives could significantly increase enrollment
13
Reducing the Cost of Education Some argue school fees prevent many students from attending school cite dramatic estimates from sub-Saharan Africa
free schooling introduced -- primary school enrollment
reportedly doubled Often data used for these estimates are unclear: free schooling is sometimes announced simultaneously with
other policy initiatives often accompanied by programs that replace school fees with per
pupil grants from the central government which create incentives for schools to over-report enrollment
14
Randomized experiments can isolate the impact of reducing costs on the quantity of schooling Several programs have gone beyond simply reducing school fees by actually paying students to attend school in the form of either cash grants or school meals School health programs can also increase quantity of schooling but this raises the question of how best to implement such programs One view is that the reliance on external financing of medicine is not sustainable and instead advocates health education, water and sanitation improvements and so forth
15
Quality of Education Notes from “Teacher Absence in India” (Kremer et. al.) Study entails a nationally representative survey on 3700 schools in India Three unannounced visits were made to each school
16
Absence data comes from direct physical verification of teacher’s presence not relying on logbooks, interviews, etc.
Teacher is recorded as absent if investigator could not find the teacher in the school during regular working hours
Journal of the European Economic Association (Resubmitted version, 11/27/04)
4
which absence calculations based on a similar methodology are available
(Table 1).3 Only 45 percent of teachers were actively engaged in teaching at
the time of the visit.4
Within India, the absence rate ranged from 15 percent in Maharashtra to 42
percent in Jharkand (Table 2).5 Absence rates are generally higher in low-
income states: doubling per capita income is associated with a 4.7 percentage
3 Most of these estimates come from other countries covered by the same research project on
provider absence in education and health, carried out by the authors of this study and using
standardized methodology (Chaudhury and others 2004).
4 Even with a generous allowance for the possibility that enumerators’ visits diverted some
teachers from teaching, it is unlikely that more than half of the teachers would have been teaching
at the time of the visit. See Kremer and others (2004).
5 Table 2 includes 19 of the 20 states surveyed. Fieldwork in the twentieth state, Delhi, was
delayed for bureaucratic reasons, and the data were received too late to be analyzed here.
Teacher absence (%)
Peru 11Ecuador 14Papua New Guinea 15Bangladesh 16Zambia 17Indonesia 19India 25Uganda 27
TABLE 1: Teacher absence rates by country
Source: Chaudhury, Hammer, Kremer, Muralidharan, and Rogers (2004) for most countries; Habyarimana and others (2004) for Zambia; World Bank (2004) for Papua New Guinea.
Journal of the European Economic Association (Resubmitted version, 11/27/04)
5
point lower predicted absence. The rates of teaching activity among the
teachers who are present are lower in higher-absence states and schools. In
some states, only 20 to 25 percent of teachers were engaged in teaching at the
time of the visit.
Absence rates are considerably higher than could be accounted for by
official non-teaching duties, such as staffing polling stations during elections or
conducting immunization campaigns, which are sometimes cited as important
causes of absence. Based on the responses of each school’s head teacher or
primary respondent, official non-teaching duties account for only about 4
percent of total absences. In other words, on any given day, only about 1
percent of primary teachers are absent because they are carrying out official
non-teaching-related duties.6 Preliminary calculations by the authors suggest
6 While stated reasons for absence should be taken with a grain of salt, there does not appear to
be any reason for head teachers to understate this cause of absence.
TABLE 2: Teacher absence in public schools by state
19
One in four teachers are absent in a typical primary school in India Absence rates are generally higher in low-income states Higher teachers’ salaries do not seem to be associated with lower teacher absence Since nominal teachers’ salaries are very similar across states relative teachers’ salaries are higher in poorer states
yet poorer states have higher absence rates
24
Notes from “Addressing Absence” (Banerjee and Duflo) Obvious method to fight teacher absence is to monitor more intensively External control need not always be about monetary incentives Most common type control: someone in the institutional hierarchy (headmaster of a school) is
giventask of keeping an eye on teacher and penalizing absences Alternative method use some impersonal method, such as a camera, for recording absence An NGO in rural India experimented with a camera
25
In this area absence rate was 44% Most schools are one-teacher schools: when the teacher is absent children just go back home and lose entire day of schooling
120 schools were selected to participate in this study 60 randomly selected schools (treatment schools) NGO gave the teacher a camera with instructions to take a picture of himself /herself every day at opening time and at closing time
Figure 1
Figure 1
Figure 2: Impact of the CamerasNumber of Schools Found Open Times in Treatment and
Comparison schools(out of 13 visits)
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Attendance Frequency (x)
Num
ber o
f Tea
cher
s pr
esen
t exa
ctly
x ti
mes
Treatment Control
27
Experimental Design
Teachers received a bonus as a function of the number of days they actually attended Teachers received a salary of 1,000 Rs. monthly if they were present at least 21 days in a month Each additional day carried a bonus of 50 Rs. up to a maximum of 1,300 per month. Each day missed carried a penalty of 50 Rs. Therefore the way the bonus was set up the average teacher’s salary remained 1,000 Rs. per month which was what teachers were paid in the remaining 60 schools (the comparison schools).
28
The program resulted in an immediate improvement in teacher attendance The absence rate of teachers was cut by one half Given the structure of the payment, the average salary in the treatment schools ended up matching almost exactly the average salary in the comparison schools The incentives were therefore effective without an increase in teachers’ net pay
Table 1: Is School Quality Similar in Treatment and Control Groups Prior to Program?
E. School Infrastructure
Percent of Teachers Interacting with Students
Percentage of Children Sitting Within Classroom
Notes: (1) Teacher Performance Measures from Random Checks only includes schools that were open during the random check. (2) Infrastructure Index: 1-5 points, with one point given if the following school attribute is sufficient: Space for Children to Play, Physical Space for Children in Room, Lighting, Library, Floor Mats
A. Teacher Attendance
B. Student Participation (Random Check)
C. Teacher Qualifications
D. Teacher Performance Measures (Random Check)
Treatment Control Difference Treatment Control Difference(1) (2) (3) (4) (5) (6)
Took Written Exam 0.17 0.19 -0.02(0.04)
1136 1094 2230
Math Score on Oral Exam 7.82 8.12 -0.30 -0.10 0.00 -0.10(0.27) (0.09)
940 888 1828 940 888 1828
Language Score on Oral Exam 3.63 3.74 -0.10 -0.03 0.00 -0.03(0.30) (0.08)
940 888 1828 940 888 1828
Total Score on Oral Exam 11.44 11.95 -0.51 -0.08 0.00 -0.08(0.48) (0.07)
940 888 1828 940 888 1828
Math Score on Written Exam 8.62 7.98 0.64 0.23 0.00 0.23(0.51) (0.18)
196 206 402 196 206 402
Language Score on Written Exam 3.62 3.44 0.18 0.08 0.00 0.08(0.46) (0.20)
196 206 402 196 206 402
Total Score on Written Exam 12.17 11.41 0.76 0.16 0.00 0.16(0.90) (0.19)
196 206 402 196 206 402
Levels Normalized by ControlTable 2: Are Students Similar Prior To Program?
Notes: (1) Children who could write were given a written exam. Children who could not write were given an oral exam. (2) Standard errors are clustered by school.
A. Can the Child Write?
B. Oral Exam
C. Written Exam
Treatment Control Diff Until Mid-Test Mid to Post Test After Post Test(1) (2) (3) (4) (5) (6)
Figure 3: Impact of the Cameras(out of at least 25 visits)
Notes: (1) Child learning levels were assessed in a mid-test (April 2004) and a post-test (November 2004). After the post-test, the "official" evaluation period was ended. Random checks continued in both the treatment and control schools. (2) Standard errors are clustered by school. (3) Panels B and C only include the 109 schools where teacher tests were available.
Table 3: Teacher AttendanceSept 2003-Feb 2006 Difference Between Treatment and Control Schools
A. All Teachers
B. Teachers with Above Median Test Scores
C. Teachers with Below Median Test Scores
0
2
4
6
8
1 4 7 10 13 16 19 22 25Atte ndance Fre que ncy
Num
ber
of T
each
ers p
rese
nt e
xact
ly x
tim
es
Treatment
Control
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
Siwan
Highlight
30
In another experiment: in treatment schools, if the headmasters marked the preschool
teachers present a sufficient number of times for the teacher to receive a prize (a bicycle).
This experiment had no effect Absence rates were not reduced This outcome suggests that when human judgment is involved in a system where rules are often bent incentives may easily be perverted
How to stop Malaria?
881,000 die each year
91% in Africa
85% under 5
881,000 die each year
91% in Africa
85% under 5
The Case for Bednets
� Malaria is transmitted by mosquitoes, mainly at dusk.
� Long Lasting Insecticide Treated Bednets prevent mosquitoes to bite
Heated policy debate
� Jeff Sachs, WHO: Give bed nets for free. � We know the science, no need to do
experiment
� Easterly, Dambisa Moyo, Population Service International: don’t give them for free.� We know the economics, no need to do
experiment!
� The true question of course is the extent to which they should be subsidized…
What we need to know
� We need to know:
� The price elasticity of the demand for bednets: if people are willing to purchase a price at the full cost, then subsidies are
not needed—if they are not willing to purchase one at ANY price, then price subsidies may be needed
� The immediate effect on use: are people who pay for bednetmore likely to use one. How much do they need to pay?
� The longer term effects—Will it wreck markets?
� On people who get it for free: will they buy nets in the future?
� On their friends and neighbors? Will they hold out for a free bednet?
How can we find out?
� Anecdotes…
Photo: Minakawa et al. 2008, “Unforeseen misuses of bed nets in
fishing villages along Lake Victoria,”
Malaria Journal
How can we find out?
� Anecdotes…
� There are certainly plenty. But usually they cut both ways.
� Compare purchase/use at various prices
� Some clinic may give them for free, other villages may not have that system, so any bednets are more likely to be obtained in the market
� Do we see fewer in those villages?
� Do we see that the few we see are used differently?
But the problem is…
� What is the right counterfactual: what would have
happen in the other situation?
� For example
� Bednets may be distributed for free in area where malaria is a
huge problem.
� So even if people had to pay for them, they would have been
more likely to get them
Purchase when bednets are expensive
High
malaria
Low
malaria
Pu
rch
ase
s
Purchase when bednets are free
High
malaria
Low
malaria
Pu
rch
ase
s
True effect of price on purchase
High
malaria
Low
malaria
Pu
rch
ase
s
Expensive
FreeExpensive
Free
Our estimate of effect if we compare low and high malaria regions
High
malaria
Low
malaria
Pu
rch
ase
s
Estimate
d effect
The bias
High
malaria
Low
malaria
Pu
rch
ase
s
Bias
EffectBias
Effect
Observed demand at various prices
0 10 20 30
Pu
rch
ase
Demand we would observe in region with free bed net, if bednets were not free
Pu
rch
ase
0 10 20 30
Bias in elasticityP
urc
hase
Problem and solution
� Problem:� What we observe in the world reflect:
� Selection bias: behavior of people would be different in different places, EVEN IF THE PRICES WERE THE SAME
� The actual treatment effect.
� And we don’t know how to separate those two effects: we do not observe how people would have behaved with a low price in the high price region (and vice-versa)
� Solution: � Randomly assign different prices in the same region
� Now, there is no systematic difference between people who face a high price and people who face a low price.
� Of course there is still the usual random noise: the sample must be large enough, and there will be some uncertainty around our estimates of the mean effects.
Dupas’ experiments
� First experiment (with Jessica Cohen)
� Randomly chose clinics, and offer bednets at different prices.
� Track purchase, and usage, in those clinic
� Findings: Compare purchase and usage at each price
Policy Implications
� What is the best price at which to charge for
bednets?
� One possible way to ask the question: price that will
minimize the cost per malaria death averted
� Trade off:
� Free bednets: more coverage
� But it cost you money…
� It turns out that in this case, the CHEAPEST way to avert malaria from the policy perspective is free
bednet. Why?
The controversy
� When Dani Rodrik posted these findings on his
website some people objected. Their main objections were:
� Pregnant women: all of them really need the bednets
� Product was well known in Kenya
� Long term effect may differ from short term effect
� This questions are all about external validity: Is the experiment valid outside of a specific context
Next step
� What is the next step needed to check these objections: � A different country: Uganda,
Madagascar
� Kenya, but not pregnant women
� A new kind of bednet
� An experiment for the long term effects:
� Entitlement effect
� Social effects
A New Experiment
� New experimental design by Pascaline Dupas to try
to address most of these questions
� Randomization done in the general population (men
and women)
� Phase 1: Different discount vouchers are randomly distributed to individuals, for buying a new kind of bednets available in shops, at various price-
� Check purchase, use, and purchase by neighbors
� Phase 2: After a few months, the new bednet is available for the same price for every one
Full price
Partial subsidy
Full subsidy
Google Earth
If people must pay for bednets, will they purchase them?
100%
80%
60%
40%
20%
0Free $0.65 $1 $1.60 $2 $3
Cost
Rate
Purchase
When people get bednetsfor free, will they use it?
100%
80%
60%
40%
20%
0Free
Cost
Purchase
Use
Rate
$0.65 $1 $1.60 $2 $3
Do free nets discourage future purchases?
30%
20%
10%
0Free
Prior cost
Future purchase of net at $2
$0.65 $1 $1.60 $2 $3
Do neighbors buy nets if other got it for free?purchase of net
$0.65 $1 $1.60 $2 $3
66%
50%
Averag e (33% receive
free)
If All receive free
Conclusion
� When we have a policy question, e.g. “what is the optimal price to charge for a bednet”, we need to start by unpacking the question: � What do we need to know to answer the question properly? Let’s not assume any answer, or
replace real answers by anecdotes, or observations that may be very misleading
� We can then design an experiment that will get us the answer to these questions.
� This is what J-PAL (poverty action lab) does…
� Examine critically whether this first experiment is enough: perhaps we need more data to conclude…
� Other than the answer to the policy question, what are the lessons from the experiments: in particular, what is the key puzzle here that we will need to answer in our section on health?