Aid optimists - Faculty of Artsfaculty.arts.ubc.ca/fpatrick/documents/RCT-Lecture-2018.pdf · 2018-10-26 · Aid optimists “I have identified the specific investments that are needed

”

Aid optimists

“I have identified the specific investments that are needed [to end poverty]; found ways to[to end poverty]; found ways to plan and implement them; [and] shown that they can be affordable ”affordable.

Jeffrey Sachs End of Poverty

2

Image by Angela Radulescu on Flickr.

http://www.flickr.com/photos/walkingthedeepfield/2275970925/

“Aft $2 3 t illi 5

p

Aid pessimists

decades, why are the desperate needs of the world's poor still so tragically unmet?

Isn't it finally time for an end to the impunity of foreign aid?”

Bill Easterly The White Man’s Burden

© Unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see http://ocw.mit.edu/fairuse.

"After $2.3 trillion over 5

33

http://ocw.mit.edu/fairuse

Books

Poor Economics: A Radical Rethinking of the Way to Fight Global Povertyby Abhijit V. Banerjee and E

sther Duflo

Publication date: April 2011

Website: http://www.pooreconomics.com/

Le Développment Humain (Lutter contre la pauvrete, volume 1)by Esther Duflo

2010, Paris: Le seuil

2011, Italian translation: Feltrinelli

La polique de l'autonomie (Lutter contre la pauvrete, volume 2)by Esther Duflo

2010, Paris: Le seuil

2011, Italian translation: Feltrinelli

Expérience, science et lutter contre la pauvretéby Esther Duflo

2009, Paris: Fayard

© 2011 MIT. All rights reserved.

MIT Department of Economics : Esther Duflo : Books http://econ-www.mit.edu/faculty/eduflo/publications

2 of 2 07/09/2011 3:17 PM

11-07-15 2:02 PMMore Than 1 Billion People Are Hungry in the World - By Abhijit Banerjee and Esther Duflo | Foreign Policy

Page 3 of 15http://www.foreignpolicy.com/articles/2011/04/25/more_than_1_billion_people_are_hungry_in_the_world?print=yes&hidecomments=yes&page=full

books, The Elusive Quest for Growth and The White Man's Burden.

Dambisa Moyo, an economist who worked at Goldman Sachs and the World

Bank, has joined her voice to Easterly's with her recent book, Dead Aid. Both

argue that aid does more bad than good. It prevents people from searching for

their own solutions, while corrupting and undermining local institutions and

creating a self-perpetuating lobby of aid agencies. The best bet for poor

countries, they argue, is to rely on one simple idea: When markets are free and

the incentives are right, people can find ways to solve their problems. They do

not need handouts from foreigners or their own governments. In this sense, the

aid pessimists are actually quite optimistic about the way the world works.

According to Easterly, there is no such thing as a poverty trap.

This debate cannot be solved in the abstract. To find out whether there are in

fact poverty traps, and, if so, where they are and how to help the poor get out of

them, we need to better understand the concrete problems they face. Some aid

programs help more than others, but which ones? Finding out required us to

step out of the office and look more carefully at the world. In 2003, we founded

what became the Abdul Latif Jameel Poverty Action Lab, or J-PAL. A key part of

our mission is to research by using randomized control trials -- similar to

experiments used in medicine to test the effectiveness of a drug -- to understand

what works and what doesn't in the real-world fight against poverty. In practical

terms, that meant we'd have to start understanding how the poor really live

their lives.

Take, for example, Pak Solhin, who lives in a small village in West Java,

Indonesia. He once explained to us exactly how a poverty trap worked. His

parents used to have a bit of land, but they also had 13 children and had to build

so many houses for each of them and their families that there was no land left

for cultivation. Pak Solhin had been working as a casual agricultural worker,

which paid up to 10,000 rupiah per day (about $2) for work in the fields. A

recent hike in fertilizer and fuel prices, however, had forced farmers to

economize. The local farmers decided not to cut wages, Pak Solhin told us, but to

http://www.amazon.com/gp/product/0262550423/ref=as_li_ss_tl?ie=UTF8&tag=fopo-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=0262550423



siwan

Highlight

siwan

Highlight

siwan

Highlight

11-07-20 1:00 AMSearch | The Abdul Latif Jameel Poverty Action Lab

Page 1 of 1http://www.povertyactionlab.org/search/apachesolr_search?view=map&filters=type:evaluation

The Abdul Latif Jameel Poverty Action Lab Contact J-PAL | Subscribe

TRANSLATING RESEARCH INTO ACTION

Legend Education Finance & MicrofinanceEnvironment & Energy HealthPolitical Economy &

GovernanceLabor MarketsAgricultureStandard Search Region-Theme Grid

Search

To refine displayed results, select one ormore of the categories below:

Keyword:

Themes all...

Policy Goals all...

Region all...

Country all...

Researchers all...

Status all...

Data all...

http://www.povertyactionlab.org/offices-contacts

http://eepurl.com/c1PT

http://www.mit.edu/

http://www.mit.edu/

http://www.povertyactionlab.org/search/apachesolr_search?filters=type:evaluation

http://www.povertyactionlab.org/search/apachesolr_search?filters=type:evaluation

http://www.povertyactionlab.org/search/apachesolr_search?view=grid&filters=type:evaluation

J PAL

Evaluation: What Why WhenEvaluation: What, Why, When

povertyactionlab.org 1

http://www.povertyactionlab.org/

f l k h

H l h t k ?

Why focus on impact evaluation?

• Surprisingly little hard evidence on what works

• Can do more with given budget with better evidence

• If people knew money was going to programs thatworked, could help increase pot for anti‐poverty programsprograms

• Instead of asking “do aid/development programswork?” should be asking:– Which work best, why and when?

– How can we scale up what works?

5

Impact: What is it?om

e Intervention

ry O

utco Impact

Prim

ar

Time 21

CounterfactualThe counterfactual represents the state of the world that program participants would have experienced in the absence of the program

Problem: Counterfactual cannot be observed

Solution: We need to “mimic” or construct the counterfactual

J-PAL | WHY RANDOMIZE 19

Presenter

Presentation Notes

The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program) Problem: Counterfactual cannot be observed Solution: We need to “mimic” or construct the counterfactual

Constructing the counterfactual

• Usually done by selecting a group of individuals that did not participate in the program

• This group is usually referred to as the control group or comparison group

• How this group is selected is a key decision in the design of any impact evaluation


Presenter

Presentation Notes

Estimating the impact (a.k.a the causal effect) of a program involves a comparison between the outcome had the intervention been introduced and the outcome had the intervention not been introduced. The latter is usually referred to as the counterfactual The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program) The counterfactual does not represent the state of the world in which participants receive absolutely no services, but rather the state of the world in which participants receive whatever services they would have received had they not participated in the program being evaluated Example: Training program The counterfactual can never be directly observed Hence, the main goal of an impact evaluation can be viewed as an effort to construct or mimic the counterfactual This is usually done by selecting a group of individuals that did not participate in the program This group is usually referred to as the control group (in case of a social experiment) or comparison group (in case we are using non-experimental methods to estimate the impact) How this group is selected is a key decision in the design of any impact evaluation The idea is to select a group that is exactly like the group of participants in all ways except one: their exposure to the program being evaluated The goal in the end is to be able to attribute differences in outcomes between the group of participants and the control/comparison group to the program (and not to other factors)

Selecting the comparison group

• Idea: Comparability

• Goal: Attribution


Presenter

Presentation Notes

Idea: Select a group that is exactly like the group of participants in all ways except one: their exposure to the program being evaluated Goal: To be able to attribute differences in outcomes between the group of participants and the comparison group to the program (and not to other factors) The critical objective of impact evaluation is to establish a credible comparison group – a group of individuals who in the absence of the program would have had outcomes similar to those who were exposed to the program. However, in reality it is generally the case that individuals who participate in a program and those who were not are different: programs are placed in specific areas (for example, poorer or richer areas) individuals are screened for participation in the program (for example, on the basis of poverty or on the basis of their motivation) and, in addition, the decision to participate is often voluntary.

II – WHAT IS A RANDOMIZED EXPERIMENT?

The basics

Start with simple case:• Take a sample of program applicants• Randomly assign them to either: Treatment Group – is offered treatment Control Group – not allowed to receive treatment (during

the evaluation period)


Key advantage of experiments

Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

any difference that subsequently arises between them can be attributed to the program rather than to other factors.

27J-PAL | WHY RANDOMIZE 27

Presenter

Presentation Notes

In all other impact evaluation methods, we need to assume that the two groups do not differ systematically at the outset or that any differences between them have been statistically accounted for But there is no way to test this assumption.

Evaluation of “Women as Policymakers”: Treatment vs. Control villages at baseline

Variables Treatment Group

Control Group Difference

Female Literacy Rate 0.35 0.34 0.01(0.01)

Number of Public Health Facilities 0.06 0.08 -0.02(0.02)

Tap Water 0.05 0.03 0.02(0.02)

Number of Primary Schools 0.95 0.91 0.04(0.08)

Number of High Schools 0.09 0.10 -0.01(0.02)

Standard Errors in parentheses. Statistics displayed for West Bengal*/*/***: Statistically significant at the 10% / 5% / 1% levelSource: Chattopadhyay and Duflo (2004)


Some variations on the basics

• Assigning to multiple treatment groups

• Assigning of units other than individuals or households

Health Centers Schools Local Governments Villages


Presenter

Presentation Notes

Assigning to units other than people/households Health Centers (in tracking nurse attendance) Schools (Measuring infrastructure) Local Governments (Assessing corruption)

Key Steps in conducting an experiment

1. Design the study carefully

2. Randomly assign people to treatment or control

3. Collect baseline data

4. Verify that assignment looks random

5. Monitor process so that integrity of experiments is not

compromised


Presenter

Presentation Notes

These 8 steps present a very simplified description of the process. Idea is to give a complete picture on how this works in a typical experiment.

Key Steps in conducting an experiment (contd.)

6. Collect follow-up data for both the treatment and

control groups

7. Estimate program impacts by comparing mean

outcomes of treatment group vs mean outcomes of the

control group

8. Assess whether program impacts are statistically

significant and practically significant


Presenter

Presentation Notes


III – WHY RANDOMIZE?

If properly designed and conducted, randomized experiments provide the most credible method to estimate the impact of a program

Why Randomize?- Conceptual Argument


Presenter

Presentation Notes


Why “most credible”?

Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

any difference that subsequently arises between them can be attributed to the program rather than to other factors.


Presenter

Presentation Notes


t t

counter actua

Constructing the counterfactual

• Counterfactual is often constructed by selecting aff d b thgroup not affected by the program

• Randomized:– Use random assignment of the program to create acontrol group which mimics the counterfactual.

• Non‐randomized:– Argue that a certain excluded group mimics the

f lcounterfactual.

22

Example #3 Balsakhi Program


Presenter

Presentation Notes

Revisiting balsakhi classes

Balsakhi Program: Background

• Implemented by Pratham, an NGO from India• Program provided tutors ( Balsakhi) to help at-risk

children with school work• In Vadodara, the balsakhi program was run in

government primary schools in 2002-2003• Teachers decided which children would get the balsakhi


Presenter

Presentation Notes

In 1994 Pratham launched the Balsakhi Program to help at-risk children acquire the basic skills they need to participate fully in the classroom. The program provided tutors for at-risk children in government schools. The tutor, called a balsakhi, or “child’s friend,” was typically a young woman hired from the local community. Balsakhis were paid between 500 and 750 rupees (US$10-15) a month. All the balsakhis had completed at least secondary school, and they were given two weeks’ training at the beginning of the school year. The program targeted children who had reached grades 3 and 4 without mastering grades 1 and 2 reading and math competencies, including spelling simple words, reading simple paragraphs, recognizing numbers, counting up to 20, and subtracting or adding single-digit numbers. Children who were lagging behind—identified as such by the teacher—were pulled out of the regular class in groups of 20 and sent for remedial tutoring, spending half the school day with the tutor.

siwan

Highlight

Balsakhi: Outcomes

• Children were tested at the beginning of the school year (Pretest) and at the end of the year (Post-test)

• QUESTION: How can we estimate the impact of the balsakhi program on test scores?


Methods to estimate impacts

• Let’s look at different ways of estimating the impacts using the data from the schools that got a balsakhi

1. Pre – Post (Before vs. After)2. Simple difference3. Difference-in-difference4. Other non-experimental methods5. Randomized Experiment


• Look at average change in test scores over the school year for the balsakhi children

1 - Pre-post (Before vs. After)


QUESTION: Under what conditions can this difference (26.42) be interpreted as the impact of the balsakhi program?

Average post-test score for children with a balsakhi

51.22

Average pretest score for children with a balsakhi

24.80

Difference 26.42

1 - Pre-post (Before vs. After)


2 - Simple difference

Children who got balsakhi

Compare test scores of…

Children who did not get balsakhi

With test scores of…

J-PAL | WHY RANDOMIZE65

2 - Simple difference

QUESTION: Under what conditions can this difference (-5.05) be interpreted as the impact of the balsakhi program?

Average score for children with a balsakhi

51.22

Average score for children without a balsakhi

56.27

Difference -5.05


3 – Difference-in-Differences

Children who got balsakhi

Compare gains in test scores of…

Children who did not get balsakhi

With gains in test scores of…


3 – Difference-in- difference

• QUESTION: Under what conditions can this difference (-5.05) be interpreted as the impact of the balsakhi program?

Pretest Post-test Difference


24.80 51.22 26.42


3 – Difference-in-difference



24.80 51.22 26.42


36.67 56.27 19.60


3 – Difference-in-difference



24.80 51.22 26.42


36.67 56.27 19.60

Difference 6.82


• Suppose we evaluated the balsakhi program using a randomized experiment

• QUESTION #1: What would this entail? How would we do it?

• QUESTION #2: What would be the advantage of using this method to evaluate the impact of the balsakhi program?

5 – Randomized Experiment


How to Randomize

Presenter

Presentation Notes

Marc will present this

Random Selection

Presenter

Presentation Notes

15 seconds First it’s important to distinguish between Random Selection and Random Assignment Both are motivated by the same principle: to get a representative sample of the population.

Random Selection

7J-PAL | WHAT IS EVALUATION

Presenter

Presentation Notes

15 seconds So say this is a map of a city in India (actually it’s the location of one of our first Microcredit studies) Say we divide it up into about 400 geographic units.

Random Selection


Monthly income, per capita

1000

500

0Population

1250

Presenter

Presentation Notes

15 seconds This is our sampling frame. Let’s also say the average monthly income per capita in this city is is 1250 rupees.

Random Selection

Randomly samplefrom area of interest

Presenter

Presentation Notes

15 seconds If we were to take a random sample of about 60: If it were truly a random sample, we should expect that the income of that random sample would be around 1250

Random Selection


1000

500

0Population Sample

12521250

Presenter

Presentation Notes

30 seconds Indeed, it’s close. 1252. With random samples, it’s unreasonable to expect the average to be exactly the average of the population. But we can get pretty close. And what’s important here is that it’s not “statistically distinguishable”. Or in other words. The difference is not “statistically significant”

Random Assignment

Randomly assignto treatment

Presenter

Presentation Notes

30 seconds With random assignment, we also start with a sampling frame. Here it’s with our random sample. But we could have also started with the entire population: all 400 communities We randomly assign half to the treatment group

Random Assignment


1000

500

0Population Treatment

12571250

Presenter

Presentation Notes

30 seconds And once again, the income of that sample, of the treatment group, is close to the population mean.

Random Assignment

Randomly assignto treatmentand control

Presenter

Presentation Notes

15 seconds And then the rest are assigned to the control

Random Assignment


1000

500

0Population Treatment Control

1257 12441250

Presenter

Presentation Notes

15 seconds And here, we also find a number very close to the population mean. Again, just as the original random sample of 60 wasn’t exactly 1250. We wouldn’t necessarily expect the random sample of 30 in the treatment group and 30 in the control group to be exactly 1250 But again, the difference here between the average income in the treatment and control groups are statistically Insignificant, And both are statistically insignificant from the population mean. The two groups are statistically equivalent. Or the two groups are balanced.

Alternate methods of Randomization?


Presenter

Presentation Notes

30 seconds So say this is a map of a city in India (actually it’s the location of one of our first Microcredit studies) Say we divide it up into about 400 geographic units.

NOT Random Assignment


Presenter

Presentation Notes

1 minutes You’d be surprised how many evaluations claim to have a randomized controlled design, and create designs exactly like this. NOTE TO PRESENTER: IF YOU HAVE EXAMPLES YOU’RE AWARE OF, PLEASE USE HERE.

NOT Random Assignment


1000

500

0Population Treatment Control

1453

1250

942

Presenter

Presentation Notes

30 seconds While there’s some remote possibility that this city is geographically homogenous. Or the poor and rich are uniformly distributed all over the city. That is extremely unlikely. In expectation, we would see different means. Therefore: the treatment group and the control group are NOT balanced. (as you can see from the difference in income)

Simple randomization: Fixed probability • For each member, set

probability (e.g. 50%).– Spot randomization

– Point-of-servicerandomization

• May end up with slightlymore in one group andfewer in the other

J-PAL | HOW TO RANDOMIZE 20

ID Coin Treatment/Control

1 Heads T

2 Heads T

3 Tails C

4 Heads T

5 Tails C

6 Heads T

7 Tails C

8 Tails C

9 Heads T

10 Heads T

Count: T: 6C: 4

Presenter

Presentation Notes

2 min Use Physician Teams or Hotspotting as an example

Complete randomization: Fixed proportion• Need sample frame• Determine number in

treatment (and in control)

• Pull out of a hat/bucket-or-

• Use random numbergenerator to orderobservations randomly

Source: Chris Blattman


Presenter

Presentation Notes

2 min USE DECISION SUPPORT AS AN EXAMPLE Need sample frame Talk about public lottery vs. Random # generator Pull out of a hat/bucket Transparent Time consuming, complex if large group Hard to stratify on many dimensions Use random number generator to order observations randomly Typically we use stata But most statistical programs—even excel—can do this (in fact, you will be doing this soon) Stata program code Circulate some examples What if no existing list? Walk ins, randomize on the spot

Unit of Randomization: Individual?


Presenter

Presentation Notes

15 seconds So first let’s start with the basics We have about 400 students in the city

Unit of Randomization: Individual?


Presenter

Presentation Notes

15 seconds We could randomly assign each individual to treatment and control If we started with a list of individuals, and wanted a complete randomization design, we could ensure we have 200 in the treatment and 200 in the control

Unit of Randomization: Clusters?


Presenter

Presentation Notes

15 seconds But say the intervention we want to test is teacher training We couldn’t have some kids in the classroom be taught Professor Doyle with training, and some with Professor Doyle without training Here, we’d want to do a cluster randomized trial.

Unit of Randomization: Class?


Presenter

Presentation Notes

15 seconds The Unit of randomization here may be the Cluster.

Unit of Randomization: Class?


Presenter

Presentation Notes

15 seconds Then entire classes would randomized to treatment and control

Unit of Randomization: School?


Presenter

Presentation Notes

15 seconds And if our intervention was training for principals on better school administration

Unit of Randomization: School?


Presenter

Presentation Notes

15 seconds We’d want to randomize at the school leve.

An education department wants to see if increasing the duration of recess can help reduce rates of obesity. What is the appropriate unit of randomization?

A. Child level

B. Household level

C. Classroom level

D. School level

E. Village level

F. Don’t know

A. B. C. D. E. F.

22%

0% 0%0%

56%

22%


Presenter

Presentation Notes

2 min

The department of agriculture believes that if farmers used more fertilizer yields would improve. One advisor believes organic fertilizer will be more effective; a second believes inorganic fertilizer is better; a third believes neither will be effective. Can we test all three beliefs within one single experiment?

A. Yes, and we should

B. No, they can only be answered with twoseparate experiments

C. No they can only be answered with three separate experiments

D. Yes, but best practice is to run separate experiments

E. Don’t knowA. B. C. D. E.

71%

0%

14%

0%

14%


Presenter

Presentation Notes

3 min

Treatment 1Treatment 2Control

Multiple treatments


Presenter

Presentation Notes

15 seconds We’ve been talking as though we have ONE treatment and a control. But it’s entirely possible to have multiple treatments

Cross-cutting treatments:Factorial Design


Performance-based pay

Y N

YGroup 1

+ CashPerformance

Group 2Cash

NGroup 3

PerformanceGroup 4Control

CashGrants

Presenter

Presentation Notes

2 min Now let’s use a more realistic example (We can customize this for HOTSPOTTING: Cell phones, Home visits) Test whether components serve as substitutes or compliments Is the whole (the interaction): Greater than, less than, or equal to the sum of its parts What is most cost-effective combination Advantage: win-win for operations, can help answer questions for them, beyond simple “impact”!



Presenter

Presentation Notes

15 seconds



Presenter

Presentation Notes

15 seconds



Presenter

Presentation Notes

15 seconds

Varying intensity of treatment

• To Measure:– Dosage

– Sensitivity

– Elasticity

– Spillovers


Presenter

Presentation Notes

1 min In medical trials the question is rarely as simple as: “Is penicillin effective at treating pneumonia?” Often the question is much more detailed What dosage of penicillin is needed, how many times a day? How long is the course? In these cases, we have many treatment arms, each is given a different dosage, different course length, etc And through this, we figure out the optimal dosage, optimal course duration, maximizing the benefits, minimizing the negative side effects (for example drug resistance, or disrupting the natural microbiome in your gut) How might this apply to a social program question?

Varying intensity of treatment (individual)

• Dosage

• Sensitivity

• Elasticity


Presenter

Presentation Notes

2 min How much should we charge for preventative health products? We know that when we charge the market rate, people underutilize In other words, people do not buy bednets and stop the spread of malaria, They do not get immunized Or many farmers do not buy fertilizer But if we hand these products out for free, people may take them for granted and not use them We may destroy a partially functioning market for these products And it costs a lot of money to subsidize. We may be overspending on this one problem, only to be short of resources for other problems. So what’s the optimal price?

Challenge 1: Difficult (logistically or politically) for Service Providers• Service providers have trouble distinguishing between

treatment and comparison (or customizing service)


treatment

comparison

Crossovers: Control receives intervention (No longer represents pure counterfactual)

Services provided to both

Presenter

Presentation Notes

1 min In medical trials, clinical researchers are so concerned about doctors’ ability to provide one randomly assigned treatment to one patient, and a different randomly assigned treatment (or status quo) to another, so it’s common practice to take discretion away from the doctor. They design “double-blind” trials. Where the patient doesn’t know which treatment their getting, but the doctor doesn’t know as well. Both the treatment and control “pills” appear identical, and the doctor not informed which pill is being given to which patients. This can be difficult or impossible once we start experimenting with different procedures or processes. If we wanted to test the effectiveness of a new process, and trained nurses on it, we couldn’t ask them to “apply” that training to some patients, and to “forget it” or “unlearn it” for others. I have an example of a project….(use “physician teams”)

Solution 1a: Assign to Different Service Providers• Service providers have trouble distinguishing between


• Have different teams provide the different treatments• Randomly assign to those teams


treatment

comparison

Presenter

Presentation Notes

30 seconds I have an example of a project…. (Physician teams?)

Solution 1b: Randomize at a different unit• Service providers have trouble distinguishing between


• Change the unit of random assignment• Have providers treat entire clusters the same


treatment

comparison

Presenter

Presentation Notes

30 seconds

Challenge 2a: Control group finds out about treatment• If treatment and control individuals know each other, the

control may get upset.

• Service providers may lose support of community• Attrition: Control withdraws participation from research


treatment

comparison

Friends in control group get upset with researchers or service providers

Talks with friends (treatment and control)

Presenter

Presentation Notes

15 seconds

Challenge 2b: Control group benefits from treatment• If treatment and control individuals know each other, the

treatment may share benefits with control.


Presenter

Presentation Notes

1 min

Challenge 2e: Control group harmed by treatment• If treatment and control individuals compete with each

other, the control may be harmed.


Without experiment

With experimentTreatment group Control group

Presenter

Presentation Notes

1 min

Solution 2a: Varying the unit to contain spillovers


treatment

comparison

friends

Presenter

Presentation Notes

30 seconds

Solution 2b: Creating a Buffer


Not sampled

Presenter

Presentation Notes

30 seconds

But perhaps not all at once

Challenge 3: Have resources to treat everyone. (Where’s the control group?)


Presenter

Presentation Notes

30 seconds Say you have the research constraint of no resource constraint. It is still possible your partner is constrained by time, by logistics They cannot provide the benefit to everyone all at once. In such a case, perhaps you can phase-in the program.

Solution 3: Phase In


Presenter

Presentation Notes

15 seconds It is still possible your partner is constrained by time, by logistics They cannot provide the benefit to everyone all at once. In such a case, perhaps you can phase-in the program.

Phase 0: No one treated yetAll control


Presenter

Presentation Notes

15 seconds

Phase 1: 1/4th treated 3/4ths control


Presenter

Presentation Notes

15 seconds

Phase 2: 2/4ths treated 2/4ths control


Presenter

Presentation Notes

15 seconds

Phase 3: 3/4ths treated 1/4th control


Presenter

Presentation Notes

15 seconds

Phase 4: All treated No control (experiment over)


Presenter

Presentation Notes

15 seconds By Phase 4, your experiment is over. So if you plan to use this approach, you better hope the phasing in, and actually, the duration of each phase takes long enough for the outcomes of the treatment group to change.

Challenge 4: There’s an eligibility criteria


Peo

ple

Income

Presenter

Presentation Notes

2 min Use example from VA Suicide prevention (feel free to change the X axis)

Challenge 4: There’s an eligibility criteria


Peo

ple

Income

Cut-offEligible Ineligible

Presenter

Presentation Notes

1 min [Feel free to discuss RDD here]

Solution 4: Relax the eligibility criteria


Peo

ple

Income

Cut-offEligible IneligibleNew Cut-off

Presenter

Presentation Notes

30 seconds

Solution 4: Randomize “on the bubble”


Peo

ple

Income

Cut-offRemain Eligible

RemainIneligibleNew Cut-off

Not in Study

Not in Study

Study Sample

Presenter

Presentation Notes

1 min [Take the time to read each box, since there may be some confusion between treatment and control group (within the study sample) vs. Receiving the program (eligible and not in study) and not receiving the program (ineligible and not in study)

Challenge 5: Program is an entitlementCannot force nor deny intervention

Presenter

Presentation Notes

2 min The Supplemental Nutrition Assistance Program, or “SNAP,” or what most people know it as, “Food stamps” is a program available to any individual or household below the poverty line. That can’t be taken away. So if we wanted to know the impact of food stamps on nutrition, how might we go about evaluating that?

Challenge 5: Program is an entitlement

Treatment Group Control Group

Presenter

Presentation Notes

30 seconds So how do we have a treatment and a control group? We cannot deny foodstamps to individuals in the control group.

Solution 5: Encouragement



Presenter

Presentation Notes

1 min In New Jersey as part of the Hotspotting program, the nurses help individuals enroll….



3/4ths take-up 1/4th take-up


Presenter

Presentation Notes

30 seconds Here you see that 3/4ths enrolled in food stamps in the treatment group And 1/4th enrolled in the control group. Now how do you measure impact?

To evaluate the effect of this program, you would first:A. Compare those who

enrolled to those who didn’t

B. Drop those who didn’t enroll from the treatment group

C. Drop those who did enroll from the control group

D. Both B&CE. Compare treatment

group to entire control group

J-PAL | HOW TO RANDOMIZE 88A. B. C. D. E.

0% 0%

67%

33%

0%

Presenter

Presentation Notes

3 min



3/4ths take-up 1/4th take-up

Entire Treatment Group Entire Control GroupCompare

toJ-PAL | HOW TO RANDOMIZE 89

Presenter

Presentation Notes

1 min In this case, you would compare the entire treatment group to the entire control group. And in a sense, you’d be evaluating the impact of “encouraging” people to take up food stamps. If you detect an impact, AND IF IT’S very unlikely this would be because of the encouragement alone, driving this impact would the impact of food stamps directly.

Problem 6: Sample size is small


Presenter

Presentation Notes

30 seconds Say for example, we have this randomization design. We have 400 people, but we’re randomly assigning to only 12 schools. This could affect the power of your experiment. (You’ll hear from Rachel later on about why that is) If that’s the case, and if it’s feasible, you may want to consider changing the unit of randomization.

Solution 6a: Change the unit of randomization


Presenter

Presentation Notes

15 seconds Perhaps a sample of 24 classrooms?

How do we increase school participation (enrollment and attendance)?

A government wants to improve school attendance at primary schools, what interventions would you recommend?

J-PAL | WHAT IS EVALUATION 14

Presenter

Presentation Notes

If you were a policy maker how would you go about improving school participation?

What is the most effective intervention to increase school participation (enrollment and attendance)?A. Text Books

B. Lunch for free

C. Free school uniforms

D. Treat intestinal worms

E. Merit scholarships

F. Improve curriculum & teaching

G. Provide better materials

H. Increase awareness of returns to education

J-PAL | WHAT IS EVALUATION 15A. B. C. D. E. F. G. H.

0%

100%

0% 0%0%0%0%0%

Presenter

Presentation Notes

Now let’s ask a much more specific question.

Impact evaluations can help answer these questions


Presenter

Presentation Notes

Impact evaluations can help answer whether programs contribute to social change, but they can also give you tools for deciding what programs to invest in in the first place if you are trying to address a certain problem with a limited amount of resources. One of the tools that J-PAL creates for these types of decision-makers is cost-effectiveness analysis, which tells you, for a given amount of money how much can you can increase student attendance, or investment in preventive health products, or micro-business profits using different kinds of programs. The above graph shows you how much additional student attendance is possible to achieve with a given program and budget constraint, be it a campaign that gives parents information on the wages their children could earn for every additional year they attend school, deworming, school meals, scholarships, subsidized uniforms and conditional cash transfers. All of these programs increased student attendance, but some were relatively more cost-effective than others. We see here that deworming has been shown to be one of the most cost-effective ways to increase children’s attendance in school, resulting in 28.6 additional years of school across the whole sample of kids who were offered deworming pills per $100 spent. Context is paramount: you would never recommend deworming if worms aren’t a problem in your context. Cost-effectiveness analysis is just one more data point or resource that can help organizations with social missions make decisions about what to invest in.

Which one of these would make a good question for an impact evaluation?A. What share of kids in

Tanzania drop out of school before completing primary?

B. Will providing kids with deworming pills or school uniforms do a better job of keeping kids in school?

C. What role does ethnicity play in student results?

J-PAL | WHAT IS EVALUATION 18A. B. C.

0%6%

94%

Which one of these would make a good question for an impact evaluation?A. Are agricultural

extension agents giving farmers the same information they were trained on?

B. What share of farmers in Kenya currently live on less than $2 a day?

C. Which kind of fertilizer works best for a plot of maize?


0% 0%0%

Which one of these would make a good question for an impact evaluation?A. Does a sexual education

program or free school uniforms have a bigger effect on teenage pregnancy rates?

B. Do teenage girls have a right to have full information regarding sexual education?

C. Are teachers spreading misinformation when delivering sexual education?


0% 0%0%

5 components of program evaluation

Impact Evaluation

Cost-Effectiveness Analysis

Needs Assessment

Theory of Change

Process Evaluation

Impact Evaluation

Cost Effectiveness Analysis J-PAL | WHAT IS EVALUATION 22

Presenter

Presentation Notes

Needs Assessment : What is the problem? Theory of Change: How, in theory, does the program fix the problem? �Process evaluation: Does the program work as planned? Impact evaluation: Were its goals achieved? The magnitude? Cost effectiveness: Given magnitude and cost, how does it compare to alternatives? Different components help you answer different questions

WATER, SANITATION & HEALTH

An Example

Presenter

Presentation Notes

The rest of this presentation walks through all five components of program evaluation using the concrete encased spring example from Kenya. The evaluation summary for this can be found here: https://www.povertyactionlab.org/evaluation/cleaning-springs-Kenya.

What do you think is the most cost-effective way to reduce diarrhea?A. Develop piped water

infrastructureB. Improve existing water

sources C. Increase supply of and

demand for chlorineD. Education on sanitation

and health E. Improved cooking stoves

for boiling waterF. Improve sanitation

infrastructure

J-PAL | WHAT IS EVALUATION 24A. B. C. D. E. F.

0%

6% 6%6%

35%

47%

Presenter

Presentation Notes

Now let’s ask a much more specific question.

NEEDS ASSESSMENT

Identifying the problem

Presenter

Presentation Notes

Needs assessments allow us to confirm whether or not the problem exists This is our first step in program evaluation

Needs AssessmentQuestions answered by a needs assessment

• Does the problem we proposing to solve actually exist? – What is the likely source of the problem?– Of the solutions proposed and tried, why are they failing?– Who is in most need?


Presenter

Presentation Notes

Each section begins with questions that can be answered by a particular component of program evaluation. This is done to emphasize that different questions are answered through different assessments and that not all questions require impact evaluations to be answered.

Needs Assessment

• Does the problem exist?– Diarrheal disease killed approximately 2.6 million people a

year between 1990 and 2000 .– 20% all child deaths (under 5 years old) are from diarrhea

…..what is the likely source?


Presenter

Presentation Notes

This section examines a particular problem, that of diarrheal disease, and then runs through potential causes and solutions. The figures given above are relevant for when the evaluation was done.

The source of the problem?


Presenter

Presentation Notes

One potential cause of this may be bad water. ( 13% of the population lack access to clean water) ** This picture shows a young boy collecting water at a naturally occurring spring. -- As you can see, some wood has been placed around the eye of this spring, but the water pools at the collection point where it can easily be contaminated with surface water run-off. In an agricultural area with incomplete sanitation coverage, this makes it easy for fecal matter (from either humans or livestock) to contaminate the collected water. -- You can also imagine in this picture how contamination in transport and storage might occur. Children sometimes collect water and can easily touch it in open containers. If this kid here has fecal matter on his hands and makes contact with the spring water (which is likely), he could easily contaminate it. Similar things can happen within the home. When water is scooped out of the top of storage containers with a dipper, it is hard to avoid touching the water.

Theory of Change

Blueprint for Change

Presenter

Presentation Notes

What is the theory behind your solution? How does that map to your theory of the problem? Many terms used for the theory of change. Explain that we will briefly touch on one model, but that there will be a whole other lecture dedicated to the theory of change tomorrow .

Theory of Change Questions answered by a theory of change

• How will the program address the needs put forth in your needs assessment?– What are the prerequisites to meet the needs?– How and why are those requirements currently lacking or

failing?– How does the program intend to target or circumvent

shortcomings? – What services will be offered?


What is a potential solution to this problem?


Presenter

Presentation Notes

One potential solution could be encased springs. This prevents contamination from the ground water.

Alternative Solution(s)?


Presenter

Presentation Notes

Latrines Information campaigns Piped water

Log FrameObjectives Hierarchy

Indicators Sources of Verification

Assumptions / Threats

Impact(Goal/ Overall

objective)

Lower rates of diarrhea

Rates of diarrhea

Household survey

Waterbornedisease is primarycause of diarrhea

Outcome(Project

Objective)

Households drink cleaner water

(Δ in) drinking water source;E. coli CFU/100ml

Household survey, water quality test at home storage

Shift away from dirty sources. No recontamination

Outputs Source water is cleaner; Families collect cleaner water

E. coli CFU/100ml;

Water qualitytest at source

continued maintenance, knowledge of maintenance practices

Inputs(Activities)

Source protection is built

Protection is present, functional

Source visits/ surveys

Sufficient materials, funding, manpower

Source: Roduner, Schlappi (2008) Logical Framework Approach and Outcome Mapping, A construct ive Attempt of Synthesis

Needs assessment

Process evaluation

Impactevaluation


Presenter

Presentation Notes

Thus can also be represented in a log frame. * Do not go into the detail, but use this to describe the difference between impact and process evaluations.

PROCESS EVALUATION

Making the program work

Process Evaluation Questions answered by a process evaluation

• Was the program carried out as planned?– Are basic tasks being completed?– Is the intervention reaching the target population?– Is the intervention being completed well or efficiently and

to the beneficiaries’ satisfaction?


Presenter

Presentation Notes

Are basic tasks being completed? Was the encased water spring constructed? Was it maintained? Is the intervention reaching the target population? Is the intervention being completed well or efficiently and to the beneficiaries’ satisfaction? Do households collect water from improved source? Does storage become re-contaminated? Do people drink from “clean” water?

IMPACT EVALUATION

Measuring how well it worked

Impact Evaluation Questions answered by impact evaluations

• Process evaluations determine if a program is running in the way it is supposed to run

• Impact evaluations determines if a program creates a change in an outcome(s)– Did concrete encased springs decrease diarrhea rates?


What was the impact?

• 66% reduction in source water e coli concentration• 24% reduction in household E coli concentration• 25% reduction in incidence of diarrhea


Presenter

Presentation Notes

Could we get images to shows these?

Making Policy from Evidence

Intervention Impact on DiarrheaSpring protection (Kenya) 25% reduction in diarrhea

incidence for ages 0-3


Making Policy from Evidence

Intervention Impact on DiarrheaSpring protection (Kenya) 25% reduction in diarrhea

incidence for ages 0-3Source chlorine dispensers(Kenya)

20-40% reduction in diarrhea

Home chlorine distribution (Kenya)

20-40% reduction in diarrhea

Hand-washing (Pakistan) 53% drop in diarrhea incidence for children under15 years old

Piped water in (Urban Morocco)

0.27 fewer days of diarrhea per child per week


Presenter

Presentation Notes

So what intervention should we invest in? Three big issues here: Inconsistent outcome measures Different contexts Cost!

COST-EFFECTIVENESS ANALYSIS

Evidence-Based Policymaking

Cost-Effectiveness Diagram


Presenter

Presentation Notes

Now we’ve gone through the lifecycle of a program evaluation (or several)

EvaluationDesign

Evaluation Implementation

Randomized Evaluation Process


RandomAssignment

Survey DesignSampleSelection

Data Collection

Data Analysis

Results

Theoryof Change

Intervention OutcomesTarget Group

Monitoring

Why Randomize

How to Randomize

MeasurementPower & Sample Size

Post-Design Challenges

Why EvaluateEvaluation Question(Causal Hypothesis)

Presenter

Presentation Notes

1 min Go through the lectures

How to MeasureSources of Measurement

Presenter

Presentation Notes

Start time: 1:44 1 min (WHOLE SECTION SHOULD TAKE ABOUT 7 MIN) So far we used the female policymaker example to demonstrate a few key points (1) Before considering what to measure, always start with the theory of change. Ideally, the theory of change will dictate what intervention is being considering, who it will impact, and on what key outcomes But beyond that, it allows us to think of how to measure all the intermediate steps, the processes, the mechanisms and even the assumptions (2) There are many potential sources of measurement (3) I wanted you to see the results of the study So now let’s generalize

First-order questions in measurement

• What data do you collect?• Where do you get it?• When do you get it?

J-PAL | M EASUREMENT & I NDICATORS 14

Presenter

Presentation Notes

30 seconds The first question you should ask is What do you want to measure? That we discussed above. For the most part, you want that to be informed by your theory of change The second question; where do you get it? There are many possible sources of data: survey data, administrative data, which we’ll go into a bit more detail about. Last: when do you want these measures? Is it okay to do all of the data collection at the end? Or should you always have a baseline? Or should you be collecting data all throughout the process? We covered the first question in detail. Now let’s focus on the second

Where can we get data?

• Obtained from other sources– Publically available

– Administrative data

– Other secondary data

• Collected by researchers– Primary data

J-PAL | M EASUREMENT & I NDICATORS 15ht t ps://commons.wikimedia.org/w iki/File:Cuyahoga_County_US_Census_Form-Herbert _Birch_Kingston_1920. jpght t ps ://commons.wikimedia.org/w iki/File:US_Navy_090123-N-9760Z-004_Hospit al_Corpsman_2nd_Class_Jennifer_Ross_files_medical_records_aboard_t he_aircraft _carrier_U SS_Nimitz_(CVN_68). jpg

Presenter

Presentation Notes

1 min Where can we get the data? Click 1: Historically, most empirical work at least in economics has been conducted using publicly available dataset. Researchers downloaded datasets from statistics bureaus, census bureaus, sample surveys run by large governments or international agencies, and then run regressions, often at the country level. Click 2: “Administrative Data can also be used: data that are collected by departments or companies for internal use, not necessarily for the purpose of research. For example, tax records. Or in the case of our last example, village meeting notes. Economists who study health get data directly from hospitals or insurance companies. Click 3: In the past few decades social scientists have scaled up their own efforts to collect their own data. In Economics, Angus Deaton recently won a Nobel prize for his pioneering work in this area

https://commons.wikimedia.org/wiki/File:Cuyahoga_County_US_Census_Form-Herbert_Birch_Kingston_1920.jpg

https://commons.wikimedia.org/wiki/File:US_Navy_090123-N-9760Z-004_Hospital_Corpsman_2nd_Class_Jennifer_Ross_files_medical_records_aboard_the_aircraft_carrier_USS_Nimitz_(CVN_68).jpg

Types and Sources of DataInformation about a person/ household / possessions

NOT about a person/ household / possessions

Information provided by a person

Automaticallygenerated


Presenter

Presentation Notes

1 min Click 1: Usually in social sciences, or social programs, when we collect data, it’s about people. And the people often know they’re providing information to someone, somehow. Whether they’re answering a survey question, taking an exam, submitting tax returns, they know the information is being collected. Click 2: Sometimes it’s not so obvious, like buying something at the store, filing a police report, data collected online, in public spaces, etc. They often know there is “personal” data being collected. They may have some conceptions or misconceptions about their level of privacy… Click 3: Sometimes we collect data not about a person, but about, for example, rainfall, pollution emissions, etc. But a person may still be involved in collecting the data. Click 4: In other cases, we use sensors and never have to interact with a person

Data collection on people

• Surveys• Exams, tests, etc.• Games• Vignettes• Direct Observation• Diaries/Logs• Focus groups• Interviews


Presenter

Presentation Notes

1 min Most of the work we do in the developing countries involves collecting primary data on people – in a wide variety of different forms This is considerably less true in the US and high-income countries where secondary sources of data tend to be much more extensive and high quality - We’ll talk a lot about surveys, but these are certainly not the only type of data you could consider collecting – list others

Survey: Modes of Data Collection

• Interviewer administered– Paper-based– Computer-assisted/ Digital– Telephone-based

• Self-administered– Paper– Computer/Digital


Presenter

Presentation Notes

1 min Surveys themselves come in many different forms Increasingly researchers are taking advantage of the technological possibilities of computer assisted or digital surveying Both for surveys that are administered by an interviewer and those that are self administered Digital surveying opens up the possibility of questionnaires that are tailored to the respondent and the responses they have given so far in the survey

When to collect data

• Baseline• During the intervention

– Process, Monitoring of intervention

• Endline• Follow-up• Scale-up• Intervention: M&E


Presenter

Presentation Notes

1 min Define each Point out that endline / follow up is most important but baseline can help with (a) het TE and (b) directionality (i.e. everyone getting worse vs everyone getting better)

Ethics

• “Experimenting on people”• Belmont Principles

– Respect for persons– Beneficence– Justice

• Institutional Review Boards (IRBs)


Presenter

Presentation Notes

3 min Click 1: A common critique we hear is that by doing randomized evaluations of social programs, we’re “experimenting on people” often without their consent. One important distinction is that people have an image of a scientist in a lab with a cage full of guinea pigs. And injecting things into them to see what happens. The problem with that image is that the injection is what upsets people. But the injection is a representation of the intervention: the social program. And most of the time in our kind of research, the evaluators are not delivering that injection. It’s the policymakers who are. What we’re trying to do is figure out a way to learn from that injection. How do we do that? We try to collect data on those who are receiving (or not) the intervention. Click 2: After some really questionable practices, in research (namely the Tuskegee Study), the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research produced the Belmont Report in 1978, which lays out the principles that govern the kind of research we do Click 3: The first principle is “Respect for persons”. So while the government doesn’t ask each individual affected by a new policy or program for their consent on whether to implement that policy. So if the government wants to test the effectiveness of computers in schools, it doesn’t ask parents whether it’s okay to have computers in the school. And even if allocation of the program is randomized, for the purpose of research, the government won’t ask for people’s consent. BUT, it is typically the researcher’s duty to ask them whether they are willing to provide information to researchers. So we always ask for INFORMED CONSENT before surveying them. They also should be aware of their right to not participate, or to withdraw participation at any time, for any question Click 4: The principle of Beneficence is that the value of the research must be worth the cost and risk to participants Click 5: the principle of Justice is that the people involved in the study must represent those likely to benefit (No medical testing on prisoners) Click 6: The Belmont report also formalizes a role for IRBs to review and approve any research

How to MeasureConcept

Presenter

Presentation Notes

Start time: 1:51 15 seconds Now that we have a sense of the types of measurement on the table, let’s zoom way out, and think about what measurement is

Concept of measurement


Construct(Intelligence)

Indicator(IQ Test)

5

Data(Test Result)

https://commons.wikimedia.org/wiki/File:Red_Silhouette_-_Brain.svg

Presenter

Presentation Notes

2 min Let’s say we’re running an early childhood development experiment Click 1: Intelligence is a key outcome indicator How would we go about measuring that? Click 2: First we need to acknowledge that intelligence is what we’d call “a construct” It doesn’t necessarily have a precise scientific definition. Click 3: we often use IQ tests to measure intelligence. We often distill a construct into a question, or series of questions. Many people take issue with IQ tests. Some believe they’re culturally biased. Some believe they arbitrarily weight certain cognitive abilities over others. I think at best, IQ is a “proxy” for intelligence. Click 4: Here the total exam score may be our indicator. Again, our indicator is a “proxy” for our construct. Sometimes our indicator is an exam, or just a single exam question (to measure a competency). It can be a survey question, or collection of questions to produce an index, it can be a medical test. It can be household profit or consumption. But here let’s stick to our IQ test Click 5: And then we take our indicator to our target population, Here we administer the IQ test to kids Click 6: And collect data Note, even though it’s the construct we’re interested in, at the end of the day, we’re really only able to analyze data. So the extent to which our data reflect the construct depends on how good a proxy the indicator is, and how successfully we collect data. If for some kids we administered the test first thing in the morning in quiet, calming rooms. And for others, if we administered the test at the end of the day, where there’s a lot of street noise and other distraction, the data we collect may not accurately reflect the TRUE IQ score of the kids.

Concept of measurement


Construct(Stress)

Indicator(Cortisol level)

12

Data(Test Result)

https://pixabay.com/en/despair-stress-alone-being-alone-862349/

Presenter

Presentation Notes

1 min. Take another example: Stress Again, stress, like intelligence is something we understand vaguely, but if someone asked me what the precise definition was, I might be lost. Click 1: Neurophysiologists suggest cortisol levels are a direct manifestation (or potentially cause) of what we consider stress levels. Click 2: However, the saliva tests we use to collect it are highly sensitive to outside factors, like when in the day the test is done. So while the indicator may be really close to what we care about. The data may be all over the place Alternatively, researchers could just ask me a question like: on a scale of 1-10, how stressed are you? The result may not be as all over the place, but I may tailor my response depending on who’s asking. If the senior faculty in my department were asking, I might say VERY. If my child was asking, I might want them to believe I’m not stressed at all. And with surveyors and respondents, it often depends on the power dynamic, and what the respondents think potential outcomes of the research may be.

The goals of measurement


Accuracy

Unbiasedness

Validity

• Precision

• Reliability

Presenter

Presentation Notes

2 min So what are the major challenges? Click 1: Our first concern is accuracy: We may know that the IQ test is a somewhat biased proxy for intelligence. For example, you can study to do better. Or many of the questions may be identical to puzzles more wealthy kids have access to. And we may know there are many factors that influence a kid’s score on the IQ test beyond intelligence. Ability to study for it, time of day, hunger, stress, etc. But it may be the best we’ve got. Click 2: Cortisol may be an example of an unbiased measure of stress. But the test is still very noisy. When we think about the mapping of an indicator on to a construct, we will refer to this as the validity of the measure – how centered is it over the bullseye? Click 3: So what about height and weight? Those are generally pretty easy to measure, tend to be pretty accurate and precise. Click 4: But sometimes we put them together to produce a body-mass index which itself can be pretty precise, but when we use the BMI to measure nourishment, we may question its accuracy. In other words, BMI is a questionable indicator for the construct of “level of nutrition”. It doesn’t take into account muscle mass versus body fat. A high BMI could reflect stunting (being shorter than expected), obesity, or just being buff. When we are thinking about the relationship between an indicator and a construct, we will call this reliability.

Validity

• In theory: – How well does the indicator map to the outcome?

(e.g. IQ tests intelligence)


Construct

Indicators

Validity

Presenter

Presentation Notes

15 seconds So summing up: Validity is how well our indicator maps to the construct

Reliability

• In theory:– The measure is consistent and precise vs. “noisy”


Construct

Indicators

Data Collection(“Response”)

Reliability

Presenter

Presentation Notes

15 seconds And reliability is the extent to which the data we collect are similar/consistent each time we measure it (regardless of who is doing the measurement)

The Response Process


Indicators

Data Collection(“Response”)

Measurement Error

Data

Presenter

Presentation Notes

Start time: 2:08 15 seconds. The response process takes us from the indicator to the data. Click 1: Also we’ll take a peek at how measurement error can creep in

4-step Response Process1.

Comprehension of the question

2. Retrieval ofInformation

3. Judgementand Estimation

4. Reporting anAnswer


Presenter

Presentation Notes

1 min: Whenever we require the respondent to think, it’s useful putting ourselves into the head of our respondent. We’ll go into more detail in the next few slides Click 1: When a surveyor asks a question, or a respondent reads a question in a form, they have to understand what the question is asking Click 2: Once the respondent has understood the question, they now have to retrieve the necessary information from their brain. Click 3: At this point they may have retrieved many of the necessary facts. Perhaps the answers given are: today, yesterday, before then. Perhaps it’s January, February, March, April… Perhaps it’s a date field. And now they have to place the memory to a particular date Click 4: Finally they need to map their memory and calculations to the answers given. Having gone through all of this, they may be hesitant to tell me.

Measurement Error: Vagueness

Vague concepts where respondents may interpret the question in different ways.

Example:

Q. Do you live with a teenager?• Yes• No

Between what age ranges is a teenager?

Make sure to define vague conceptsJ-PAL | M EASUREMENT & I NDICATORS 51

Presenter

Presentation Notes

30 seconds

Measurement Error: Completeness

The response categories do not include all categories that can be expected as a response

Example:

Q. What is the highest level of education completed?• Basic Education (1-5th)• Middle School (6th-8th)• High School (9th-12th)• College Degree• Post Graduate • Other Professional Degree (e.g. Medical, Law, Teacher)

“No education” or “vocational degree” is not a response

Pilot question to make sure that categories are exhaustiveJ-PAL | M EASUREMENT & I NDICATORS 53

Presenter

Presentation Notes

Here, we alter the question slightly to get around the issue of comprehension. And we’ve clarified the definition of But we’re still missing some important categories.

Measurement Error: Negatives

Questions that include negatives can be confusing to the respondent and lead to misinterpretations.

Example:

Q. Do you think that you should not let your children play contact sports?• Yes• No

Having a negative might throw some people off

Avoid unnecessary negativesJ-PAL | M EASUREMENT & I NDICATORS 55

Presenter

Presentation Notes

30 sec

Measurement Error: Overlapping CategoriesThe categories overlap each other.

Example:

Q. How many hours a day do you work?• Less than an hour• Between one and four hours• Between three and eight hours• Between eight and ten hours• More than ten hours

What would a person who works eight hours a day reply?

Make sure that all categories are mutually exclusiveJ-PAL | M EASUREMENT & I NDICATORS 57

Presenter

Presentation Notes

30 sec

Measurement Error: Presumptions

The question assumes certain things about the respondent

Example:

Q. How would you rate the quality of coffee this morning?• Very good• Somewhat good• Not good

We are assuming that the respondent drank the coffee

Use filters and skip patternsJ-PAL | M EASUREMENT & I NDICATORS 59

Presenter

Presentation Notes

30 sec Again, this would be a problem at the response step. The best practice is to use

Measurement Error: Framing effect

People react to a particular choice in different ways depending on how it is presented i.e. prefer gains over losses

Example:

Q. Two new treatments have been developed to treat 600 terminally ill patients. Treatment A will save 200 people, while Treatment B will allow 400 people to die. Which treatment would you prefer? • Treatment A• Treatment B

Treatment A is preferable because it has been framed as a gain

Try to be neutral when framing questionsJ-PAL | M EASUREMENT & I NDICATORS 61

Presenter

Presentation Notes

30 seconds

Measurement Error: Recall Bias

People may retrieve recollections regarding events or experiences differently

Example:

Q. How long did you have to wait last time you voted?• No time (there was no line, or I voted by mail)• Less than 10 minutes• Between 10 minutes and 30• More than 30 minutes but less than an hour• An hour or more

This experience may be more vivid for some respondents than others.

You can ask respondents to keep a diary or save their receiptsJ-PAL | M EASUREMENT & I NDICATORS 63

Presenter

Presentation Notes

1 min Recall bias is really comes in at the estimation process. How this could bias your response is, for example, those in states who voted more recently may have a clearer memory. Or those whose candidate lost the election may be more likely to remember having to wait longer than those whose candidate won. Simply because they’re angry that they had to wait, and may feel that their candidate’s vote was surpressed.

Measurement Error: Anchoring Bias

People tend to rely too heavily on the first piece of information seen

Example:

Q. In Arizona, some voters reported having to wait more than 5 hours to vote. How long did you have to wait last time you voted?• No time (there was no line, or I voted by mail)• Less than 10 minutes• Between 10 minutes and 30• More than 30 minutes but less than an hour• An hour or moreRespondents will be more likely to give a number on the higher end of the spectrum

Avoid adding anchors to your questionsJ-PAL | M EASUREMENT & I NDICATORS 65

Presenter

Presentation Notes

1 min “Avoiding anchors” may seem obvious. But some times these anchors creep in through the ordering of questions. If you ask a respondent a question that elicits a response with a large number. It’s shown that in many cases the next question will be biased upwards.

Measurement Error: Telescoping Bias

People perceive recent events as being more remote than they are (backward telescoping) and distant events as being more recent than they are (forward telescoping)

Example:

Q. Did you purchase a TV or other electronic (worth over $500) in the past 12 months?____________ emails

This will lead to over reporting due to forward telescoping of events that happened before 12 months ago

Visit once at the beginning of the reference period. Then ask, “since the last time I v isited you, have you…?”


Presenter

Presentation Notes

This is usually a problem with what we call “lumpy purchases” or “investments”. If you purchased something large a little over a year ago, you may feel like you’re doing a disservice to the survey by Excluding it. You assume the surveyor cares more about whether you purchased something large, than the specific timeframe. But that causes problems for the survey. If say, 100% of respondents make a $500 purchase every OTHER year. But all feel compelled to include it when responding about purchases in the LAST year, then as evaluators, we’ll over estimate the number of large purchases by 100%. One way of dealing with this is by visiting once at the beginning of the reference period, and once at the end.

Measurement Error: Social Desirability BiasTendency of respondents to answer questions in a manner that is favorable to others i.e. emphasize strengths, hide flaws, or avoid stigma

Example:

Q. Do you beat your wife?• Yes• No

Respondents would be shy to admit to such behavior

Ask indirectly, ensure privacyJ-PAL | M EASUREMENT & I NDICATORS 69

Key Steps in conducting an experiment

1. Design the study carefully

2. Randomly assign people to treatment or control

3. Collect baseline data

4. Verify that assignment looks random

5. Monitor process so that integrity of experiments is not

compromised


Presenter

Presentation Notes


Key Steps in conducting an experiment (contd.)

6. Collect follow-up data for both the treatment and

control groups

7. Estimate program impacts by comparing mean

outcomes of treatment group vs mean outcomes of the

control group

8. Assess whether program impacts are statistically

significant and practically significant


Presenter

Presentation Notes


J-PAL | THREATS AND ANALYSIS

• Random assignment of subjects to treatments– receiving treatment statistically independent of subjects’

potential outcomes

• Non-interference: subject’s potential outcomes reflect only whether they receive the treatment themselves

– Subject’s potential outcomes unaffected by how treatments happened to be allocated

• Excludability: subject’s potential outcomes respond only to defined treatment, not other extraneous factors that may be correlated with treatment

– Importance of defining treatment precisely and maintaining symmetry between treatment and control groups (e.g., through blinding)

Core assumptions

11


Noncompliance

• Sometimes there is a disjunction between the treatment that is assigned and the treatment that is received– Miscommunication and administrative mishaps– Subjects may be unreachable– Encouragements sometimes don’t work

• Addressing noncompliance requires careful attention to “excludability” assumptions – Are outcomes affected only by the treatment? Or by

both the assignment and the treatment?

16

Treatment groupParticipants

No-Shows

Control groupNon-

Participants

Crossovers

Random Assignment

Bad idea: biased

What can you do?Can you switch them?


Handling noncompliance

17


No-Shows

Control groupNon-

Participants

Crossovers

Random Assignment



Bad idea: biased

What can you do?Can you drop them?

18


No-Shows

Control groupNon-

Participants

Crossovers

Random Assignment

Inferences should be based solely on comparisons of randomly assigned groups



19


Noncompliance: avoiding common errors

• Subjects you fail to treat are NOT part of the control group!

• Do not throw out subjects who fail to comply with their assigned treatment

• Base your estimation strategy on the ORIGINAL treatment and control groups, which were randomly assigned and therefore have comparable potential outcomes

20

Promise of experiments:

Surprisingly positive results

o (Miguel/Kremer 2004) showed that deworming treatment (costs 49 cents/child per year) can reduce abesenteeims from by school by one-quarter

o In terms of increasing attendance – deworming is 20 times as effective as hiring an extra teacher, even though both work in the sense of generating statistically significant improvements

o Economic intuition would not have helped us come to this conclusion

o NGOs were equally uniformed about this comparison

sk children around the world why they are not in school and you will get many answers: cost, distance, lack of facilities. Very few of

them will mention worms—soil-transmitted hel-minths (STHs) and schistosomes. Until recently few experts would have mentioned worms as a key barrier to schooling either.

Four hundred million children of school-age are chronically infected with intestinal worms. In-fected children suffer listlessness, diarrhea, ab-dominal pain and anemia. These parasites are so widespread that some societies do not recognize infection as a medical problem. Symptoms of worms, such as blood in the stool, are considered a natural part of growing up. So even though safe, cheap, and effective oral medication that can kill 99 percent of worms in the body is available and the World Health Organization (WHO) recom-mends mass deworming of school-aged children, only 10 percent of at-risk children get treated.

OCTOBER 2007

Policy Briefcase No. 4

Abdul Latif Jameel Poverty Action LabMIT Department of Economics

E60-275

30 Memorial Drive

Cambridge, MA 02142

Voice: 614 324 3852

Email: [email protected]

www.povertyactionlab.org

Mass Deworming: A Best-Buy for Education and Health

For more details on this study

see Miguel and Kremer (2004)

and Kremer and Miguel (2007)

available at

www.povertyactionlab.org

A This Briefcase (based on Miguel and Kremer, 2004; and Kremer and Miguel, 2007) reports the results of a randomized impact evaluation of a de-worming program in western Kenya. The results show that school-based mass deworming—where every child in a school is treated—is the most cost-effective way to increase school participation (of all the alternatives that have been rigorously evaluated). It is also one of the most cost-effective ways to improve health that we know of.

Similar educational benefits were found when intestinal worms were eradicated from the southern states of the U.S. in 1915 (Bleakley, 2007). Follow-up work found that attempts to make the program self-sustaining—through health education and user fees—led to its col-lapse. Only long-term funding of a school-based program sustained the benefits.

Summary

What was done About 30,000 children in 75 primary schools in rural Kenya were treated en masse in schools with drugs for hookworm, whipworm, roundworm, and schistosomiasis (bilharzia).

Key Impacts Reduced the incidence of moderate-to-heavy infections by 25 percentage points.

Reduced school absenteeism by 25 percent, with the largest gains among the youngest pupils.

School participation in the area increased by at least 0.14 years of schooling per treated child.

There was no evidence that deworming increased test scores.

Cost Effectiveness

Cost: 50 cents per child per year

Health: US$5 for every Disability Adjusted Life Year (DALY) saved

Education: US$3.50 for each additional year of school participation

Take Action Nowwww.dewormtheworld.org

siwan

Highlight

11-07-20 1:08 AMOur Story | dewormtheworld

Page 1 of 1http://www.dewormtheworld.org/?q=node/68

Search

Home » About Us » Our Story

Our Story

Over 400 million children are infected with parasitic worms. Although the harm they cause to children’s health and educationhas been recognized since the 1980s, deworming was not widespread due to more urgent health sector priorities. However,over two decades later, new groundbreaking research changed how the education sector viewed school-based deworming.

There were three key findings. First, researchers showed that the health impacts of deworming were significantly greater thanpreviously estimated, due to the spillover effects of treatment. Second, they illustrated that mass deworming drastically improvedschool participation. In fact, it is one of the best returns on investment of any intervention evaluated to increase school attendance.Finally, they conclusively demonstrated that deworming through schools is an efficient and effective way to treat large numbers ofchildren.

Investigators have also since followed up on this research to show the long run impacts of deworming, which result in increasedearnings and workforce participation of adults who received two to three additional years of treatment during school.

This evidence was a breakthrough. School-based deworming was globally recognized as a ‘best buy’ for development, and thebenefits and cost-effectiveness of school-based deworming were now clear to both the health and education sectors. However,additional barriers remained, and millions of children continued to go without treatment. Some countries needed access to drugs, whileothers needed technical assistance and capacity building. In addition, policies needed to be developed or strengthened in order tosupport school-based deworming programs.

Recognizing the huge opportunity to impact the lives of millions of children, economists Michael Kremer and Esther Duflo shared theevidence with fellow members of the Young Global Leaders Education Task Force, who promptly launched the Deworm the WorldInitiative in January 2007 at the World Economic Forum Annual Meeting in Davos, Switzerland.

The Deworm the World Initiative is operated as a partnership between Innovations for Poverty Action and Partnership for ChildDevelopment. Working together, the Initiative has reached 20 million children in 27 countries by supporting the launch of newcountry programs and enabling the continued activity of existing ones.

www.dewormtheworld.org Disclaimer Sitemap Designed By SunGard Copyright @ 2011

SUPPORT US

HOME ABOUT US WHY DEWORM OUR WORK FOR IMPLEMENTERS GET INVOLVED NEWS RESOURCES

JOIN US

GOEmail:

Print This Page

Our Story

Board Of Directors

Staff

Partner Organizations

Contact Us

See the evidence for school-based deworming

http://www.dewormtheworld.org/

http://www.dewormtheworld.org/

http://www.dewormtheworld.org/?q=node/23

http://www.povertyactionlab.org/policy-lessons/education/student-attendance

http://www.copenhagenconsensus.com/Projects/Copenhagen%20Consensus%202008/Outcome.aspx

http://www.poverty-action.org/

http://www.child-development.org/



http://www.dewormtheworld.org/?q=sitemap

http://www.sungard.com/








http://www.dewormtheworld.org/?q=news

http://www.dewormtheworld.org/?q=resources

http://www.dewormtheworld.org/?q=print/68

https://www.facebook.com/DewormtheWorld

http://www.twitter.com/dewormtheworld

http://www.stumbleupon.com/submit?url=http://www.dewormtheworld.org/?q=node/68&title=Our%20Story

http://www.linkedin.com/company/2231991

http://www.dewormtheworld.org/?q=node/68#







Multiple treatment experiments can be informative

o Duflo, Kremer, Robinson (2010) reflects an iterative process

o succession of experiments on fertilizer use were run over a period of several years

o each set of results prompting the need to try out a series of new variation in order to better understand results of previous one

Theoretical Motivation

o Experiments designed to assess whether there is a demand for commitment products (Ashraf, Karlan, and Yin 2006) – came from theoretical motivation

o Karlan and others – experiments emerging as powerful too for testing theories

Biggest Advantage:

Experiments may be that they take us into terrain where observational approaches are not available

Objections raised by critics best viewed as warnings against over-interpreting experimental results

Also concerns about what experiments are doing to development economics as a field

Generalizability

Environmental Dependence - Core element of generalizability – would the same result occur in a different setting?

Effect is not constant across individuals – likely vary systematically with covariates?

Concern of implementer effects and compliance – smaller organization (NGO) – estimated treatment effect reflects unique characteristics of implementer

e.g. some NGOs refuse to randomize

Randomization Issues

Fact that there is an experiment going on might generate selection effects that would not arise in non-experimental setting (being part of an experiment and being monitored influences participants)

Villagers not used to private organization going around offering them things

Necessary that individuals are not aware that they are excluded from program (difficult when randomization is at individual level, easier if randomization is at village level)

Equilibrium Effects

Program effects from small study may not generalize when program is scaled up

e.g. :

Vouchers to go to private school

Students end up with better education and higher incomes

Scale up program to national level

Crowding in private schools (collapse of public schools)

Returns to education fall because of increased supply

Experimental evidence overstates returns to vouchers program

Notes from: “Instruments, Randomization, and Learning about Development” (Deaton 2010)

Effectiveness of development assistance is topic of great public interest

Much public debate among non-economists takes it for granted that, if the funds were made available, poverty would be eliminated -- Amongst economists, it is mixed.

Macro perspective: can foreign assistance raise growth and eliminate poverty?

Micro perspective: what sorts of projects are likely to be effective? Should aid focus on roads, electricity, schools, health clinics?

Answer – we don’t know – how should we go about finding out?

Frustration with Aid organizations

Particularly the World Bank

Allegedly failing to learn from its projects and to build up a systematic catalogue of what works and what does not

Movement toward randomized controlled experiments:

Esther Duflo:

“ randomized trials can revolutionize social policy during 21st century just as they revolutionized medicine during the 20th”

---- Lancet editorial headed “ The World Bank is finally embracing science”

Deaton argues:

under ideal circumstances randomized evaluations of projects are useful for obtaining convincing estimates of the average treatment effect of a program or project

This focus is too narrow and too local to tell us “what works” in development and to design policy or to advance scientific knowledge about development processes

Argues that work needs to be refocused – not answer which projects work but why

Bigger question:

RCTs allow investigator to induce variation that might not arise nonexperimentally – but are these the relevant ones?

RCTs of “what works”

even when done without error of contamination

unlikely to be helpful for policy or move beyond the local

unless they tell us something about why

RCTs are not targeted or suited to these questions

Actual policy will always be different than experiments:

General equilibrium effects that operate on large scale

Outcomes are different when everyone is covered by treatment rather than a few

Experimental subjects are not representative of population

Small development projects at village level do not attract attention of corrupt politicians

Scientists or experimentalists more careful than government implementers

Transporting successful experiments?

Mexico’s PROGRESA program

Conditional cash transfer program paid to parents if children attend schools and clinics

Now in 30 other countries

Is this a good thing?

Cannot simply be exported if countries have

Pre-existing anti-poverty programs with conditional transfers

No capacity to meet increased demands of education and health care

No political support

Combination of mechanism and context that makes for scientific progress

Much interest in RCTs, and instrumental variables, and other econometric techniques that mimic random allocation

comes from skepticism of economic theory

impatience with its ability to deliver structures that seem helpful in interpreting reality

Internal versus external validity:

Contrast between the rigor applied to establish internal validity and the looser analysis to render it policy relevant

To do this typically use some theory or some other information from observables – both go against simplicity of RCTs

Applied and theoretical economists have never been so far apart

Failure to reintegrate is not an option

Otherwise no chance of long term scientific progress extending from the RCTs.

RCTs that are not theoretically guided are unlikely to have more than local validity

14-10-15 1:09 PMPre-analysis plans at Berkeley's BITSS conference — Running Randomized Evaluations: A Practical Guide

Page 1 of 5http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference

RUN N IN G RAN DOMIZED EVALUAT ION S: A PRACTICAL GUIDE

BLOGRUNNING RANDOMIZED EVALUATIONS CHAPTERS RESOURCES

BUY THE BOOK

Pre-analysis plans at

Berkeley's BITSS

conferenceDecember 16, 2013

On December 12th I attended the annualmeeting of the Berkeley Initiative forTransparency in the Social Sciences (BITSS).BITSS brings together economists, politicalscientists, biostatisticians, and psychologists tothink through how to improve the norms andincentives to promote transparency in thesocial sciences. I was on a panel talking about

http://runningres.com/

http://runningres.com/blog/

http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference#




http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference

http://runningres.com/blog/2013/12/16/pre-analysis-plans-at-berkeleys-bitss-conference

http://bitss.org/



preanalysis plans in which researchers specifyin advance how they will analyze their data.

I have now been involved in writing four ofthese plans and my thinking about them hasevolved, as has the sophistication of the plans.Kate Casey, Ted Miguel and I first wrote one ofthese plans for our evaluation of a CommunityDriven Development program in Sierra Leone(see the previous blog ). It was exactly the typeof evaluation where pre-analysis plans aremost useful. We had a large number ofoutcome variables with no obvious hierarchyof which ones were most important so wespecified how all the outcomes would begrouped into families and tested as a group.While the outcomes were complex therandomization design was simple (onetreatment, one comparison group).

The next case also included multidimentionaloutcomes: empowerment of adolescent girls inBangladesh. However, now we had fivetreatments and a comparison group withdifferent treatments targeted at different ages.The task of prespecifying was overwhelmingand we made mistakes. It was extremelydifficult to think through in advance whatsubsequent analysis would make sense forevery combination of results we might getfrom the different arms. We also failed to takeinto account that some of our outcomes in agiven group were clearly more important thanothers: we ended up with strong effects onyears of schooling and math and literacyscores but the overall “education” effect wasweakened by no or negative effects onindicators like how often a girl read amagazine. We hope, when we write the paperpeople will agree it makes sense to deviate

http://www.povertyactionlab.org/evaluation/community-driven-development-sierra-leone

http://runningres.com/blog/2013/12/5/smackdown-on-community-driven-development

http://www.povertyactionlab.org/evaluation/empowering-girls-rural-bangladesh



preanalysis plans in which researchers specifyin advance how they will analyze their data.

I have now been involved in writing four ofthese plans and my thinking about them hasevolved, as has the sophistication of the plans.Kate Casey, Ted Miguel and I first wrote one ofthese plans for our evaluation of a CommunityDriven Development program in Sierra Leone(see the previous blog ). It was exactly the typeof evaluation where pre-analysis plans aremost useful. We had a large number ofoutcome variables with no obvious hierarchyof which ones were most important so wespecified how all the outcomes would begrouped into families and tested as a group.While the outcomes were complex therandomization design was simple (onetreatment, one comparison group).

The next case also included multidimentionaloutcomes: empowerment of adolescent girls inBangladesh. However, now we had fivetreatments and a comparison group withdifferent treatments targeted at different ages.The task of prespecifying was overwhelmingand we made mistakes. It was extremelydifficult to think through in advance whatsubsequent analysis would make sense forevery combination of results we might getfrom the different arms. We also failed to takeinto account that some of our outcomes in agiven group were clearly more important thanothers: we ended up with strong effects onyears of schooling and math and literacyscores but the overall “education” effect wasweakened by no or negative effects onindicators like how often a girl read amagazine. We hope, when we write the paperpeople will agree it makes sense to deviate

http://www.povertyactionlab.org/evaluation/community-driven-development-sierra-leone

http://runningres.com/blog/2013/12/5/smackdown-on-community-driven-development

http://www.povertyactionlab.org/evaluation/empowering-girls-rural-bangladesh

11

The millennium development goal calls for a universal primary education by 2015 little consensus on how to achieve this goal or how much it

would cost

12

One view attracting additional children to school will be difficult since

most children not in school in developing countries are earning income their families need

Another view potential contribution of children of primary school age to family

income is very small hence modest incentives could significantly increase enrollment

13

Reducing the Cost of Education Some argue school fees prevent many students from attending school cite dramatic estimates from sub-Saharan Africa

free schooling introduced -- primary school enrollment

reportedly doubled Often data used for these estimates are unclear: free schooling is sometimes announced simultaneously with

other policy initiatives often accompanied by programs that replace school fees with per

pupil grants from the central government which create incentives for schools to over-report enrollment

14

Randomized experiments can isolate the impact of reducing costs on the quantity of schooling Several programs have gone beyond simply reducing school fees by actually paying students to attend school in the form of either cash grants or school meals School health programs can also increase quantity of schooling but this raises the question of how best to implement such programs One view is that the reliance on external financing of medicine is not sustainable and instead advocates health education, water and sanitation improvements and so forth

15

Quality of Education Notes from “Teacher Absence in India” (Kremer et. al.) Study entails a nationally representative survey on 3700 schools in India Three unannounced visits were made to each school

16

Absence data comes from direct physical verification of teacher’s presence not relying on logbooks, interviews, etc.

Teacher is recorded as absent if investigator could not find the teacher in the school during regular working hours

Journal of the European Economic Association (Resubmitted version, 11/27/04)

4

which absence calculations based on a similar methodology are available

(Table 1).3 Only 45 percent of teachers were actively engaged in teaching at

the time of the visit.4

Within India, the absence rate ranged from 15 percent in Maharashtra to 42

percent in Jharkand (Table 2).5 Absence rates are generally higher in low-

income states: doubling per capita income is associated with a 4.7 percentage

3 Most of these estimates come from other countries covered by the same research project on

provider absence in education and health, carried out by the authors of this study and using

standardized methodology (Chaudhury and others 2004).

4 Even with a generous allowance for the possibility that enumerators’ visits diverted some

teachers from teaching, it is unlikely that more than half of the teachers would have been teaching

at the time of the visit. See Kremer and others (2004).

5 Table 2 includes 19 of the 20 states surveyed. Fieldwork in the twentieth state, Delhi, was

delayed for bureaucratic reasons, and the data were received too late to be analyzed here.

Teacher absence (%)

Peru 11Ecuador 14Papua New Guinea 15Bangladesh 16Zambia 17Indonesia 19India 25Uganda 27

TABLE 1: Teacher absence rates by country

Source: Chaudhury, Hammer, Kremer, Muralidharan, and Rogers (2004) for most countries; Habyarimana and others (2004) for Zambia; World Bank (2004) for Papua New Guinea.

Journal of the European Economic Association (Resubmitted version, 11/27/04)

5

point lower predicted absence. The rates of teaching activity among the

teachers who are present are lower in higher-absence states and schools. In

some states, only 20 to 25 percent of teachers were engaged in teaching at the

time of the visit.

Absence rates are considerably higher than could be accounted for by

official non-teaching duties, such as staffing polling stations during elections or

conducting immunization campaigns, which are sometimes cited as important

causes of absence. Based on the responses of each school’s head teacher or

primary respondent, official non-teaching duties account for only about 4

percent of total absences. In other words, on any given day, only about 1

percent of primary teachers are absent because they are carrying out official

non-teaching-related duties.6 Preliminary calculations by the authors suggest

6 While stated reasons for absence should be taken with a grain of salt, there does not appear to

be any reason for head teachers to understate this cause of absence.

State Absence (%) State Absence (%)

Maharashtra 14.6 West Bengal 24.7Gujarat 17.0 Andhra Pradesh 25.3Madhya Pradesh 17.6 Uttar Pradesh 26.3Kerala 21.2 Chhatisgarh 30.6Himachal Pradesh 21.2 Uttaranchal 32.8Tamil Nadu 21.3 Assam 33.8Haryana 21.7 Punjab 34.4Karnataka 21.7 Bihar 37.8Orissa 23.4 Jharkhand 41.9Rajasthan 23.7 Weighted Average 24.8

TABLE 2: Teacher absence in public schools by state

19

One in four teachers are absent in a typical primary school in India Absence rates are generally higher in low-income states Higher teachers’ salaries do not seem to be associated with lower teacher absence Since nominal teachers’ salaries are very similar across states relative teachers’ salaries are higher in poorer states

yet poorer states have higher absence rates

24

Notes from “Addressing Absence” (Banerjee and Duflo) Obvious method to fight teacher absence is to monitor more intensively External control need not always be about monetary incentives Most common type control: someone in the institutional hierarchy (headmaster of a school) is

giventask of keeping an eye on teacher and penalizing absences Alternative method use some impersonal method, such as a camera, for recording absence An NGO in rural India experimented with a camera

25

In this area absence rate was 44% Most schools are one-teacher schools: when the teacher is absent children just go back home and lose entire day of schooling

120 schools were selected to participate in this study 60 randomly selected schools (treatment schools) NGO gave the teacher a camera with instructions to take a picture of himself /herself every day at opening time and at closing time

Figure 1

Figure 1

Figure 2: Impact of the CamerasNumber of Schools Found Open Times in Treatment and

Comparison schools(out of 13 visits)

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Attendance Frequency (x)

Num

ber o

f Tea

cher

s pr

esen

t exa

ctly

x ti

mes

Treatment Control

27

Experimental Design

Teachers received a bonus as a function of the number of days they actually attended Teachers received a salary of 1,000 Rs. monthly if they were present at least 21 days in a month Each additional day carried a bonus of 50 Rs. up to a maximum of 1,300 per month. Each day missed carried a penalty of 50 Rs. Therefore the way the bonus was set up the average teacher’s salary remained 1,000 Rs. per month which was what teachers were paid in the remaining 60 schools (the comparison schools).

28

The program resulted in an immediate improvement in teacher attendance The absence rate of teachers was cut by one half Given the structure of the payment, the average salary in the treatment schools ended up matching almost exactly the average salary in the comparison schools The incentives were therefore effective without an increase in teachers’ net pay

Treatment Control Difference(1) (2) (3)

School Open 0.66 0.64 0.02(0.11)

41 39 80

Number of Students Present 17.71 15.92 1.78(2.31)

27 25 52

Teacher Test Scores 34.99 33.62 1.37(2.01)

53 56 109

Teacher Highest Grade Completed 10.21 9.80 0.41(0.46)

57 54 111

0.83 0.84 0.00(0.09)

27 25 52

0.78 0.72 0.06(0.12)

27 25 52

Blackboards Utilized 0.85 0.89 -0.04(0.11)

20 19 39

Infrastructure Index 3.39 3.20 0.19(0.30)

57 55 112

Fstat(1,110) 1.21p-value (0.27)

Table 1: Is School Quality Similar in Treatment and Control Groups Prior to Program?

E. School Infrastructure

Percent of Teachers Interacting with Students

Percentage of Children Sitting Within Classroom

Notes: (1) Teacher Performance Measures from Random Checks only includes schools that were open during the random check. (2) Infrastructure Index: 1-5 points, with one point given if the following school attribute is sufficient: Space for Children to Play, Physical Space for Children in Room, Lighting, Library, Floor Mats

A. Teacher Attendance

B. Student Participation (Random Check)

C. Teacher Qualifications

D. Teacher Performance Measures (Random Check)

Treatment Control Difference Treatment Control Difference(1) (2) (3) (4) (5) (6)

Took Written Exam 0.17 0.19 -0.02(0.04)

1136 1094 2230

Math Score on Oral Exam 7.82 8.12 -0.30 -0.10 0.00 -0.10(0.27) (0.09)

940 888 1828 940 888 1828

Language Score on Oral Exam 3.63 3.74 -0.10 -0.03 0.00 -0.03(0.30) (0.08)

940 888 1828 940 888 1828

Total Score on Oral Exam 11.44 11.95 -0.51 -0.08 0.00 -0.08(0.48) (0.07)

940 888 1828 940 888 1828

Math Score on Written Exam 8.62 7.98 0.64 0.23 0.00 0.23(0.51) (0.18)

196 206 402 196 206 402

Language Score on Written Exam 3.62 3.44 0.18 0.08 0.00 0.08(0.46) (0.20)

196 206 402 196 206 402

Total Score on Written Exam 12.17 11.41 0.76 0.16 0.00 0.16(0.90) (0.19)

196 206 402 196 206 402

Levels Normalized by ControlTable 2: Are Students Similar Prior To Program?

Notes: (1) Children who could write were given a written exam. Children who could not write were given an oral exam. (2) Standard errors are clustered by school.

A. Can the Child Write?

B. Oral Exam

C. Written Exam

Treatment Control Diff Until Mid-Test Mid to Post Test After Post Test(1) (2) (3) (4) (5) (6)

0.79 0.58 0.21 0.20 0.20 0.23(0.03) (0.04) (0.04) (0.04)

1575 1496 3071 882 660 1529

0.78 0.63 0.15 0.15 0.15 0.14(0.04) (0.05) (0.05) (0.06)

843 702 1545 423 327 795

0.78 0.53 0.24 0.21 0.14 0.32(0.04) (0.05) (0.06) (0.06)

625 757 1382 412 300 670

Figure 3: Impact of the Cameras(out of at least 25 visits)

Notes: (1) Child learning levels were assessed in a mid-test (April 2004) and a post-test (November 2004). After the post-test, the "official" evaluation period was ended. Random checks continued in both the treatment and control schools. (2) Standard errors are clustered by school. (3) Panels B and C only include the 109 schools where teacher tests were available.

Table 3: Teacher AttendanceSept 2003-Feb 2006 Difference Between Treatment and Control Schools

A. All Teachers

B. Teachers with Above Median Test Scores

C. Teachers with Below Median Test Scores

0

2

4

6

8

1 4 7 10 13 16 19 22 25Atte ndance Fre que ncy

Num

ber

of T

each

ers p

rese

nt e

xact

ly x

tim

es

Treatment

Control

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

Siwan

Highlight

30

In another experiment: in treatment schools, if the headmasters marked the preschool

teachers present a sufficient number of times for the teacher to receive a prize (a bicycle).

This experiment had no effect Absence rates were not reduced This outcome suggests that when human judgment is involved in a system where rules are often bent incentives may easily be perverted

How to stop Malaria?

881,000 die each year

91% in Africa

85% under 5

881,000 die each year

91% in Africa

85% under 5

The Case for Bednets

� Malaria is transmitted by mosquitoes, mainly at dusk.

� Long Lasting Insecticide Treated Bednets prevent mosquitoes to bite

Heated policy debate

� Jeff Sachs, WHO: Give bed nets for free. � We know the science, no need to do

experiment

� Easterly, Dambisa Moyo, Population Service International: don’t give them for free.� We know the economics, no need to do

experiment!

� The true question of course is the extent to which they should be subsidized…

What we need to know

� We need to know:

� The price elasticity of the demand for bednets: if people are willing to purchase a price at the full cost, then subsidies are

not needed—if they are not willing to purchase one at ANY price, then price subsidies may be needed

� The immediate effect on use: are people who pay for bednetmore likely to use one. How much do they need to pay?

� The longer term effects—Will it wreck markets?

� On people who get it for free: will they buy nets in the future?

� On their friends and neighbors? Will they hold out for a free bednet?

How can we find out?

� Anecdotes…

Photo: Minakawa et al. 2008, “Unforeseen misuses of bed nets in

fishing villages along Lake Victoria,”

Malaria Journal

How can we find out?

� Anecdotes…

� There are certainly plenty. But usually they cut both ways.

� Compare purchase/use at various prices

� Some clinic may give them for free, other villages may not have that system, so any bednets are more likely to be obtained in the market

� Do we see fewer in those villages?

� Do we see that the few we see are used differently?

But the problem is…

� What is the right counterfactual: what would have

happen in the other situation?

� For example

� Bednets may be distributed for free in area where malaria is a

huge problem.

� So even if people had to pay for them, they would have been

more likely to get them

Purchase when bednets are expensive

High

malaria

Low

malaria

Pu

rch

ase

s

Purchase when bednets are free

High

malaria

Low

malaria

Pu

rch

ase

s

True effect of price on purchase

High

malaria

Low

malaria

Pu

rch

ase

s

Expensive

FreeExpensive

Free

Our estimate of effect if we compare low and high malaria regions

High

malaria

Low

malaria

Pu

rch

ase

s

Estimate

d effect

The bias

High

malaria

Low

malaria

Pu

rch

ase

s

Bias

EffectBias

Effect

Observed demand at various prices

0 10 20 30

Pu

rch

ase

Demand we would observe in region with free bed net, if bednets were not free

Pu

rch

ase

0 10 20 30

Bias in elasticityP

urc

hase

Problem and solution

� Problem:� What we observe in the world reflect:

� Selection bias: behavior of people would be different in different places, EVEN IF THE PRICES WERE THE SAME

� The actual treatment effect.

� And we don’t know how to separate those two effects: we do not observe how people would have behaved with a low price in the high price region (and vice-versa)

� Solution: � Randomly assign different prices in the same region

� Now, there is no systematic difference between people who face a high price and people who face a low price.

� Of course there is still the usual random noise: the sample must be large enough, and there will be some uncertainty around our estimates of the mean effects.

Dupas’ experiments

� First experiment (with Jessica Cohen)

� Randomly chose clinics, and offer bednets at different prices.

� Track purchase, and usage, in those clinic

� Findings: Compare purchase and usage at each price

Policy Implications

� What is the best price at which to charge for

bednets?

� One possible way to ask the question: price that will

minimize the cost per malaria death averted

� Trade off:

� Free bednets: more coverage

� But it cost you money…

� It turns out that in this case, the CHEAPEST way to avert malaria from the policy perspective is free

bednet. Why?

The controversy

� When Dani Rodrik posted these findings on his

website some people objected. Their main objections were:

� Pregnant women: all of them really need the bednets

� Product was well known in Kenya

� Long term effect may differ from short term effect

� This questions are all about external validity: Is the experiment valid outside of a specific context

Next step

� What is the next step needed to check these objections: � A different country: Uganda,

Madagascar

� Kenya, but not pregnant women

� A new kind of bednet

� An experiment for the long term effects:

� Entitlement effect

� Social effects

A New Experiment

� New experimental design by Pascaline Dupas to try

to address most of these questions

� Randomization done in the general population (men

and women)

� Phase 1: Different discount vouchers are randomly distributed to individuals, for buying a new kind of bednets available in shops, at various price-

� Check purchase, use, and purchase by neighbors

� Phase 2: After a few months, the new bednet is available for the same price for every one

Full price

Partial subsidy

Full subsidy

Google Earth

If people must pay for bednets, will they purchase them?

100%

80%

60%

40%

20%

0Free $0.65 $1 $1.60 $2 $3

Cost

Rate

Purchase

When people get bednetsfor free, will they use it?

100%

80%

60%

40%

20%

0Free

Cost

Purchase

Use

Rate

$0.65 $1 $1.60 $2 $3

Do free nets discourage future purchases?

30%

20%

10%

0Free

Prior cost

Future purchase of net at $2

$0.65 $1 $1.60 $2 $3

Do neighbors buy nets if other got it for free?purchase of net

$0.65 $1 $1.60 $2 $3

66%

50%

Averag e (33% receive

free)

If All receive free

Conclusion

� When we have a policy question, e.g. “what is the optimal price to charge for a bednet”, we need to start by unpacking the question: � What do we need to know to answer the question properly? Let’s not assume any answer, or

replace real answers by anecdotes, or observations that may be very misleading

� We can then design an experiment that will get us the answer to these questions.

� This is what J-PAL (poverty action lab) does…

� Examine critically whether this first experiment is enough: perhaps we need more data to conclude…

� Other than the answer to the policy question, what are the lessons from the experiments: in particular, what is the key puzzle here that we will need to answer in our section on health?

Aid optimists - Faculty of Artsfaculty.arts.ubc.ca/fpatrick/documents/RCT-Lecture-2018.pdf · 2018-10-26 · Aid optimists “I have identified the specific investments that are needed

Documents

Aid optimists - Faculty of Artsfaculty.arts.ubc.ca/fpatrick/documents/RCT-Lecture-2018.pdf · 2018-10-26 · Aid optimists “I have identified the specific investments that are needed