Propensity Score Matching and the EMA pilot evaluation Lorraine Dearden IoE and Institute for Fiscal Studies RMP Conference 22 nd November 2007.

Propensity Score Propensity Score Matching and the Matching and the

EMA pilot evaluationEMA pilot evaluationLorraine DeardenLorraine Dearden

IoE and IoE and Institute for Fiscal StudiesInstitute for Fiscal Studies

RMP Conference 22RMP Conference 22ndnd November November 20072007

The Evaluation ProblemThe Evaluation Problem

Question which we want to answer is Question which we want to answer is – What is the effect of some treatment What is the effect of some treatment

(D(Dii=1) on some outcome of interest (Y=1) on some outcome of interest (Y1i1i) ) compared to the outcome (Ycompared to the outcome (Y0i0i) if the ) if the treatment had taken place (Dtreatment had taken place (Dii=0) =0)

Problem is that it is impossible to Problem is that it is impossible to observed both outcomes of interest observed both outcomes of interest to get the true causal effectto get the true causal effect

How can we solve this How can we solve this problem?problem?

Randomised experimentRandomised experiment– Randomly assign people to treatment Randomly assign people to treatment

group and control groupgroup and control group– If groups large enough, the distribution of If groups large enough, the distribution of

all pre-treatment characteristics in the all pre-treatment characteristics in the two groups should be identical so any two groups should be identical so any difference in outcome can be attributed difference in outcome can be attributed to the treatmentto the treatment

– Not generally availableNot generally available– Not always solutionNot always solution

Propensity Score MatchingPropensity Score Matching

Instead have to rely on non-experimental Instead have to rely on non-experimental approachesapproaches

Propensity score matching is one such method Propensity score matching is one such method that is gaining popularity because of simplicitythat is gaining popularity because of simplicity

Crucial, however, to understand the Crucial, however, to understand the assumptions underlying the approach (and all assumptions underlying the approach (and all approaches)approaches)

Again NOT always appropriate Again NOT always appropriate – may need to rely on other method e.g. may need to rely on other method e.g.

instrumental variables, control function instrumental variables, control function

AssumptionsAssumptions Need to have a treatment group and some type Need to have a treatment group and some type

of appropriate non-treated group from which you of appropriate non-treated group from which you can select a control groupcan select a control group– Finding an appropriate and convincing control group Finding an appropriate and convincing control group

is often the most difficult evaluation taskis often the most difficult evaluation task Assume ALL relevant differences between the Assume ALL relevant differences between the

groups pre-treatment can be captured by groups pre-treatment can be captured by observable characteristics in your data (X)observable characteristics in your data (X)– Having high quality and extensive pre-treatment Having high quality and extensive pre-treatment

observables is crucial!observables is crucial!– Conditional Independence Assumption (CIA) Conditional Independence Assumption (CIA)

assumptionassumption Common support – return to thisCommon support – return to this

What are we trying to What are we trying to measure?measure?

Average treatment effect for the population (ATE)Average treatment effect for the population (ATE) Average treatment effect on the treated (ATT)Average treatment effect on the treated (ATT) Average treatment effect on the non-treated Average treatment effect on the non-treated

(ATNT)(ATNT) Usually interested in ATT:Usually interested in ATT:

E(YE(Y11 – Y – Y00|D=1) = E(Y|D=1) = E(Y11|D=1) – E(Y|D=1) – E(Y00|D=1)|D=1) – OLS - ATT=ATE=ATNTOLS - ATT=ATE=ATNT– IV – LATE IV – LATE – Matching and control function - ATE, ATT & ATNTMatching and control function - ATE, ATT & ATNT

How can we find How can we find E(YE(Y00|D=1)|D=1)??

What is treatment?What is treatment?

Most robust design is Intention to Most robust design is Intention to Treat (ITT) analysis – treatment is all Treat (ITT) analysis – treatment is all individuals who could have taken up individuals who could have taken up program whether they did or notprogram whether they did or not

Another approach is ‘receipt of Another approach is ‘receipt of treatment’ approach – but here treatment’ approach – but here sometimes much more difficult to sometimes much more difficult to find an appropriate control groupfind an appropriate control group

MatchingMatching

Involves selecting from the non-Involves selecting from the non-treated pool a control group in which treated pool a control group in which the distribution of observed variables the distribution of observed variables is as similar as possible to the is as similar as possible to the distribution in the treated groupdistribution in the treated group

There are a number of ways of doing There are a number of ways of doing this but they almost always involve this but they almost always involve calculating the propensity score calculating the propensity score ppii(x)(x) PrPr{{DD=1|=1|XX==xx}}

The propensity scoreThe propensity score

The propensity score is the probability of being The propensity score is the probability of being in the treatment group given you have in the treatment group given you have characteristics X=xcharacteristics X=x

How do you do this?How do you do this? Use parametic methods (i.e. logit or probit) and Use parametic methods (i.e. logit or probit) and

estimate the probability of a person being in estimate the probability of a person being in the treatment group for all individuals in the the treatment group for all individuals in the treatment and non-treatment groupstreatment and non-treatment groups

Rather than matching on the basis of ALL X’s Rather than matching on the basis of ALL X’s can match on basis of this propensity score can match on basis of this propensity score (Rosenbaum and Rubin (1983))(Rosenbaum and Rubin (1983))

How do we match?How do we match?

Nearest neighbour matching Nearest neighbour matching – each person in the treatment group each person in the treatment group

choose individual(s) with the closest choose individual(s) with the closest propensity score to thempropensity score to them

– can do this with (most common) or can do this with (most common) or without replacementwithout replacement

– not very efficient as discarding a lot of not very efficient as discarding a lot of information about the control groupinformation about the control group

Kernel based matchingKernel based matching– each person in the treatment group is each person in the treatment group is

matched to a weighted sum of individuals matched to a weighted sum of individuals who have similar propensity scores with who have similar propensity scores with greatest weight being given to people with greatest weight being given to people with closer scorescloser scores

– Some kernel based matching use ALL people Some kernel based matching use ALL people in non-treated group (e.g. Gaussian kernel) in non-treated group (e.g. Gaussian kernel) whereas others only use people within a whereas others only use people within a certain probability user-specified bandwidth certain probability user-specified bandwidth (e.g. Epanechnikov )(e.g. Epanechnikov )

– Choice of bandwidth involves a trade-off of Choice of bandwidth involves a trade-off of bias with precisionbias with precision

Other methodsOther methods

Radius matchingRadius matching Caliper matchingCaliper matching Mahalanobis matchingMahalanobis matching Local linear regression matchingLocal linear regression matching Spline matching…..Spline matching…..

Imposing Common SupportImposing Common Support

In order for matching to be valid we In order for matching to be valid we need to observe participants and need to observe participants and non-participants with the same range non-participants with the same range of characteristicsof characteristics– i.e for all characteristics X there are i.e for all characteristics X there are

treated and non-treated individualstreated and non-treated individuals If this cannot be achievedIf this cannot be achieved

– treated units whose treated units whose pp is larger than the is larger than the largest largest pp in the non-treated pool are left in the non-treated pool are left unmatched unmatched

How do we get standard How do we get standard errors?errors?

Asymptotics of propensity score Asymptotics of propensity score matching hard/impossible to definematching hard/impossible to define

Generally need to ‘Bootstrap’ standard Generally need to ‘Bootstrap’ standard errorserrors

Take a random draw from your sample Take a random draw from your sample with replicationwith replication

Repeat this 500 to 1000 timesRepeat this 500 to 1000 times Standard Deviation of these estimates Standard Deviation of these estimates

gives you your standard errorgives you your standard error

What was the EMA pilot?What was the EMA pilot?

EMA pilots involved payment of up to EMA pilots involved payment of up to £40 per week for 16-18 year olds who £40 per week for 16-18 year olds who remained in full-time education remained in full-time education

4 different variants tested:4 different variants tested: V1 – up to £30 per week, £50 retention V1 – up to £30 per week, £50 retention

and and achievement bonus achievement bonusV2 – V1 but up to £40 per weekV2 – V1 but up to £40 per weekV3 – V1 but paid to motherV3 – V1 but paid to motherV4 – V1 but more generous bonusesV4 – V1 but more generous bonuses

Justifications for Justifications for interventionintervention

Low levels of participation in post-16 Low levels of participation in post-16 education among low income familieseducation among low income families

Presence of liquidity constraints?Presence of liquidity constraints?– need evidence on the returns to educationneed evidence on the returns to education– Card (2000), Cameron & Heckman (2001) Card (2000), Cameron & Heckman (2001)

suggest that these may not be that suggest that these may not be that importantimportant

– Meghir & Palme (1999) find evidence of Meghir & Palme (1999) find evidence of liquidity constraints using Swedish dataliquidity constraints using Swedish data

Design of the evaluationDesign of the evaluation

Interviews with young people and parents Interviews with young people and parents in 10 EMA pilot areas and 11 control areasin 10 EMA pilot areas and 11 control areas

Information collected both among those Information collected both among those income-eligible and income-ineligible for income-eligible and income-ineligible for the EMAthe EMA

First survey involved young people who First survey involved young people who completed Year 11 in 1999 (cohort 1)completed Year 11 in 1999 (cohort 1)

Parental questionnaire only in initial surveyParental questionnaire only in initial survey Cohort 1 followed up 3 timesCohort 1 followed up 3 times

The dataThe data

Questionnaires have detailed Questionnaires have detailed information on:information on:– all components of family incomeall components of family income– household compositionhousehold composition– GCSE resultsGCSE results– mother’s and father’s education, mother’s and father’s education,

occupation and work historyoccupation and work history– early childhood circumstancesearly childhood circumstances– current activities of young peoplecurrent activities of young people

Matching approachMatching approach Involves taking all eligible individuals in the pilot Involves taking all eligible individuals in the pilot

areas and matching them with a weighted sum of areas and matching them with a weighted sum of individuals who look like them in control areas individuals who look like them in control areas

Difference in full-time education outcomes in pilot Difference in full-time education outcomes in pilot and control areas in this matched sample is the and control areas in this matched sample is the estimate of the EMA effect (ATT)estimate of the EMA effect (ATT)

Crucial assumption is that we observe everything Crucial assumption is that we observe everything that determines education participationthat determines education participation

How do we do this?How do we do this? Don’t match on all X’s, but can instead match on the Don’t match on all X’s, but can instead match on the

propensity score (Rosenbaum and Rubin, 1983)propensity score (Rosenbaum and Rubin, 1983)

Propensity score is just predicted probability of being Propensity score is just predicted probability of being in a pilot area given all the observables in our datain a pilot area given all the observables in our data

Use kernel-based matching (Heckman, Ichimura & Use kernel-based matching (Heckman, Ichimura & Todd, 1998)Todd, 1998)

We do this matching for each sub-group of interestWe do this matching for each sub-group of interest

Family backgroundFamily background– household composition, housing status, ethnicity, household composition, housing status, ethnicity,

early childhood characteristics, older siblings’ early childhood characteristics, older siblings’ education and parents’ age, education, work education and parents’ age, education, work status and occupationstatus and occupation

Family incomeFamily income– current family income, whether on means-tested current family income, whether on means-tested

benefitsbenefits Ability (GCSE results)Ability (GCSE results) SSchool variableschool variables Indicators of ward level deprivationIndicators of ward level deprivation

Variables we match Variables we match on:on:

Results Y12: urban menResults Y12: urban men

Participation in Pilot Areas

Unmatched

Linear Prob, OLS

Kernel PSM

Fully Interacted OLS

FT Education

66.4 5.3 (2.0)

4.7 (2.0)

4.8 (2.3)

5.0 (2.0)

Work 19.7 -1.5 (1.7)

-2.1 (2.0)

-2.9 (2.0)

-2.5 (2.0)

NEET 13.9 -3.8 (1.5)

-2.7 (1.2)

-1.9 (1.7)

-2.4 (1.4)

Sample size 2,653 2,653 2,647 2,647

Note: Income eligibles only

Results Y12: urban womenResults Y12: urban women

Participation in Pilot Areas

Unmatched

Linear Prob, OLS

Kernel PSM

Fully Interacted OLS

FT Education

71.9 2.5 (1.9)

2.9 (1.7)

4.2 (2.3)

4.0 (1.7)

Work 13.0 0.7 (1.4)

0.4 (1.6)

-0.5 (2.0)

-0.4 (1.4)

NEET 15.1 -3.2 (1.5)

-3.3 (0.9)

-3.7 (1.7)

-3.6 (0.9)

Sample size 2,662 2,662 2,662 2,662


Results Y13:Results Y13:

Partic Pilot Areas

Impact Males

Partic Pilot Areas

Impact Females

Ed Y12-Ed Y13 58.7 8.1 (2.8)

63.4

4.4 (2.8)

Ed Y12-OthY13 13.1 -3.1 (2.1)

13.8 -0.9 (2.2)

OthY12-EdY13 1.7 -0.5 (0.9)

2.9 0.8 (0.8)

OthY12-OthY13 26.4 -4.5 (2.6)

19.9 -4.4 (2.3)

Retention rate 81.7 6.1 (3.0)

82.1 2.0 (2.8)

Sample size 1,211 1,295


Results by Eligibility Results by Eligibility GroupsGroups

In Year 12 impact concentrated on In Year 12 impact concentrated on those who are fully eligible (6.7-6.9 those who are fully eligible (6.7-6.9 % pts)% pts)– No significant effect for boys or girls on No significant effect for boys or girls on

tapertaper– No effect on ineligiblesNo effect on ineligibles

In Year 13 impact on both groupsIn Year 13 impact on both groups– EMA impacts significantly on retention EMA impacts significantly on retention

for those on the taperfor those on the taper

Does is matter who EMA Does is matter who EMA paid to?paid to?

No difference if we do not distinguish No difference if we do not distinguish by eligibilityby eligibility

For variant where paid to child For variant where paid to child impact is concentrated on those fully impact is concentrated on those fully eligibleeligible

For variant where paid to mother For variant where paid to mother impact on those who are fully and impact on those who are fully and partially eligiblepartially eligible

Credit Constraints?Credit Constraints?

Follow consumption literature (see Zeldes Follow consumption literature (see Zeldes (1989)) split the sample by assets, the idea (1989)) split the sample by assets, the idea being that those with assets are not liquidity being that those with assets are not liquidity constrained. constrained. – Compare results for home-owners and non Compare results for home-owners and non

home-ownershome-owners The key assumption here is that house The key assumption here is that house

ownership in itself does not lead to different ownership in itself does not lead to different responses to financial incentives, other than responses to financial incentives, other than because it implies different access to funds.because it implies different access to funds.

ResultsResults

Significant impact for non home-Significant impact for non home-owners of 9.1 percentage pointsowners of 9.1 percentage points

Insignificant impact of home-owners Insignificant impact of home-owners of 3.8 percentage pointsof 3.8 percentage points

But difference of 5.3 percentage But difference of 5.3 percentage points is not significant at points is not significant at conventional levels (p-value 12%)conventional levels (p-value 12%)

ConclusionsConclusions

EMA effect around 4.5 percentage pointsEMA effect around 4.5 percentage points

Plays a role in reducing gender Plays a role in reducing gender differences in stay-on rates particularly differences in stay-on rates particularly retention in Year 13retention in Year 13

Important to control for local area effectsImportant to control for local area effects– matching on ward level data importantmatching on ward level data important

Other conclusionsOther conclusions More effective paying to child rather More effective paying to child rather

than parent for those fully eligiblethan parent for those fully eligible

More effective paying to mother for More effective paying to mother for those who are partially eligiblethose who are partially eligible

Increase drawn from both work and Increase drawn from both work and NEET groupsNEET groups

Some evidence it may be alleviating Some evidence it may be alleviating credit constraintscredit constraints

What else can you do with What else can you do with Matching?Matching?

What is the policy question you are interested What is the policy question you are interested in?in?

Is ATT the appropriate measure?Is ATT the appropriate measure? In returns to schooling evaluation we are In returns to schooling evaluation we are

much more interested in ATNTmuch more interested in ATNT What is treatment – ITT versus ‘receipt of What is treatment – ITT versus ‘receipt of

treatment’treatment’– Take-up usually an important policy implication Take-up usually an important policy implication

therefore usually inappropriate (& difficult) to therefore usually inappropriate (& difficult) to compare actual participants with an appropriate compare actual participants with an appropriate control group but sometimes no choice!control group but sometimes no choice!

Propensity Score Matching and the EMA pilot evaluation Lorraine Dearden IoE and Institute for Fiscal Studies RMP Conference 22 nd November 2007.

Documents

treatment approach

treatment group choo

treatment d i

nontreatment groups

convincing control group

itt analysis treatment

characteristics x

propensity score rosenbaum