Propensity Score Propensity Score Matching and the Matching and the EMA pilot EMA pilot evaluation evaluation Lorraine Dearden Lorraine Dearden IoE and IoE and Institute for Fiscal Studies Institute for Fiscal Studies RMP Conference 22 RMP Conference 22 nd nd November November 2007 2007
31
Embed
Propensity Score Matching and the EMA pilot evaluation Lorraine Dearden IoE and Institute for Fiscal Studies RMP Conference 22 nd November 2007.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Propensity Score Propensity Score Matching and the Matching and the
EMA pilot evaluationEMA pilot evaluationLorraine DeardenLorraine Dearden
IoE and IoE and Institute for Fiscal StudiesInstitute for Fiscal Studies
RMP Conference 22RMP Conference 22ndnd November November 20072007
The Evaluation ProblemThe Evaluation Problem
Question which we want to answer is Question which we want to answer is – What is the effect of some treatment What is the effect of some treatment
(D(Dii=1) on some outcome of interest (Y=1) on some outcome of interest (Y1i1i) ) compared to the outcome (Ycompared to the outcome (Y0i0i) if the ) if the treatment had taken place (Dtreatment had taken place (Dii=0) =0)
Problem is that it is impossible to Problem is that it is impossible to observed both outcomes of interest observed both outcomes of interest to get the true causal effectto get the true causal effect
How can we solve this How can we solve this problem?problem?
Randomised experimentRandomised experiment– Randomly assign people to treatment Randomly assign people to treatment
group and control groupgroup and control group– If groups large enough, the distribution of If groups large enough, the distribution of
all pre-treatment characteristics in the all pre-treatment characteristics in the two groups should be identical so any two groups should be identical so any difference in outcome can be attributed difference in outcome can be attributed to the treatmentto the treatment
– Not generally availableNot generally available– Not always solutionNot always solution
Instead have to rely on non-experimental Instead have to rely on non-experimental approachesapproaches
Propensity score matching is one such method Propensity score matching is one such method that is gaining popularity because of simplicitythat is gaining popularity because of simplicity
Crucial, however, to understand the Crucial, however, to understand the assumptions underlying the approach (and all assumptions underlying the approach (and all approaches)approaches)
Again NOT always appropriate Again NOT always appropriate – may need to rely on other method e.g. may need to rely on other method e.g.
instrumental variables, control function instrumental variables, control function
AssumptionsAssumptions Need to have a treatment group and some type Need to have a treatment group and some type
of appropriate non-treated group from which you of appropriate non-treated group from which you can select a control groupcan select a control group– Finding an appropriate and convincing control group Finding an appropriate and convincing control group
is often the most difficult evaluation taskis often the most difficult evaluation task Assume ALL relevant differences between the Assume ALL relevant differences between the
groups pre-treatment can be captured by groups pre-treatment can be captured by observable characteristics in your data (X)observable characteristics in your data (X)– Having high quality and extensive pre-treatment Having high quality and extensive pre-treatment
observables is crucial!observables is crucial!– Conditional Independence Assumption (CIA) Conditional Independence Assumption (CIA)
assumptionassumption Common support – return to thisCommon support – return to this
What are we trying to What are we trying to measure?measure?
Average treatment effect for the population (ATE)Average treatment effect for the population (ATE) Average treatment effect on the treated (ATT)Average treatment effect on the treated (ATT) Average treatment effect on the non-treated Average treatment effect on the non-treated
(ATNT)(ATNT) Usually interested in ATT:Usually interested in ATT:
E(YE(Y11 – Y – Y00|D=1) = E(Y|D=1) = E(Y11|D=1) – E(Y|D=1) – E(Y00|D=1)|D=1) – OLS - ATT=ATE=ATNTOLS - ATT=ATE=ATNT– IV – LATE IV – LATE – Matching and control function - ATE, ATT & ATNTMatching and control function - ATE, ATT & ATNT
How can we find How can we find E(YE(Y00|D=1)|D=1)??
What is treatment?What is treatment?
Most robust design is Intention to Most robust design is Intention to Treat (ITT) analysis – treatment is all Treat (ITT) analysis – treatment is all individuals who could have taken up individuals who could have taken up program whether they did or notprogram whether they did or not
Another approach is ‘receipt of Another approach is ‘receipt of treatment’ approach – but here treatment’ approach – but here sometimes much more difficult to sometimes much more difficult to find an appropriate control groupfind an appropriate control group
MatchingMatching
Involves selecting from the non-Involves selecting from the non-treated pool a control group in which treated pool a control group in which the distribution of observed variables the distribution of observed variables is as similar as possible to the is as similar as possible to the distribution in the treated groupdistribution in the treated group
There are a number of ways of doing There are a number of ways of doing this but they almost always involve this but they almost always involve calculating the propensity score calculating the propensity score ppii(x)(x) PrPr{{DD=1|=1|XX==xx}}
The propensity scoreThe propensity score
The propensity score is the probability of being The propensity score is the probability of being in the treatment group given you have in the treatment group given you have characteristics X=xcharacteristics X=x
How do you do this?How do you do this? Use parametic methods (i.e. logit or probit) and Use parametic methods (i.e. logit or probit) and
estimate the probability of a person being in estimate the probability of a person being in the treatment group for all individuals in the the treatment group for all individuals in the treatment and non-treatment groupstreatment and non-treatment groups
Rather than matching on the basis of ALL X’s Rather than matching on the basis of ALL X’s can match on basis of this propensity score can match on basis of this propensity score (Rosenbaum and Rubin (1983))(Rosenbaum and Rubin (1983))
How do we match?How do we match?
Nearest neighbour matching Nearest neighbour matching – each person in the treatment group each person in the treatment group
choose individual(s) with the closest choose individual(s) with the closest propensity score to thempropensity score to them
– can do this with (most common) or can do this with (most common) or without replacementwithout replacement
– not very efficient as discarding a lot of not very efficient as discarding a lot of information about the control groupinformation about the control group
Kernel based matchingKernel based matching– each person in the treatment group is each person in the treatment group is
matched to a weighted sum of individuals matched to a weighted sum of individuals who have similar propensity scores with who have similar propensity scores with greatest weight being given to people with greatest weight being given to people with closer scorescloser scores
– Some kernel based matching use ALL people Some kernel based matching use ALL people in non-treated group (e.g. Gaussian kernel) in non-treated group (e.g. Gaussian kernel) whereas others only use people within a whereas others only use people within a certain probability user-specified bandwidth certain probability user-specified bandwidth (e.g. Epanechnikov )(e.g. Epanechnikov )
– Choice of bandwidth involves a trade-off of Choice of bandwidth involves a trade-off of bias with precisionbias with precision
Other methodsOther methods
Radius matchingRadius matching Caliper matchingCaliper matching Mahalanobis matchingMahalanobis matching Local linear regression matchingLocal linear regression matching Spline matching…..Spline matching…..
Imposing Common SupportImposing Common Support
In order for matching to be valid we In order for matching to be valid we need to observe participants and need to observe participants and non-participants with the same range non-participants with the same range of characteristicsof characteristics– i.e for all characteristics X there are i.e for all characteristics X there are
treated and non-treated individualstreated and non-treated individuals If this cannot be achievedIf this cannot be achieved
– treated units whose treated units whose pp is larger than the is larger than the largest largest pp in the non-treated pool are left in the non-treated pool are left unmatched unmatched
How do we get standard How do we get standard errors?errors?
Asymptotics of propensity score Asymptotics of propensity score matching hard/impossible to definematching hard/impossible to define
Generally need to ‘Bootstrap’ standard Generally need to ‘Bootstrap’ standard errorserrors
Take a random draw from your sample Take a random draw from your sample with replicationwith replication
Repeat this 500 to 1000 timesRepeat this 500 to 1000 times Standard Deviation of these estimates Standard Deviation of these estimates
gives you your standard errorgives you your standard error
What was the EMA pilot?What was the EMA pilot?
EMA pilots involved payment of up to EMA pilots involved payment of up to £40 per week for 16-18 year olds who £40 per week for 16-18 year olds who remained in full-time education remained in full-time education
4 different variants tested:4 different variants tested: V1 – up to £30 per week, £50 retention V1 – up to £30 per week, £50 retention
and and achievement bonus achievement bonusV2 – V1 but up to £40 per weekV2 – V1 but up to £40 per weekV3 – V1 but paid to motherV3 – V1 but paid to motherV4 – V1 but more generous bonusesV4 – V1 but more generous bonuses
Justifications for Justifications for interventionintervention
Low levels of participation in post-16 Low levels of participation in post-16 education among low income familieseducation among low income families
Presence of liquidity constraints?Presence of liquidity constraints?– need evidence on the returns to educationneed evidence on the returns to education– Card (2000), Cameron & Heckman (2001) Card (2000), Cameron & Heckman (2001)
suggest that these may not be that suggest that these may not be that importantimportant
– Meghir & Palme (1999) find evidence of Meghir & Palme (1999) find evidence of liquidity constraints using Swedish dataliquidity constraints using Swedish data
Design of the evaluationDesign of the evaluation
Interviews with young people and parents Interviews with young people and parents in 10 EMA pilot areas and 11 control areasin 10 EMA pilot areas and 11 control areas
Information collected both among those Information collected both among those income-eligible and income-ineligible for income-eligible and income-ineligible for the EMAthe EMA
First survey involved young people who First survey involved young people who completed Year 11 in 1999 (cohort 1)completed Year 11 in 1999 (cohort 1)
Parental questionnaire only in initial surveyParental questionnaire only in initial survey Cohort 1 followed up 3 timesCohort 1 followed up 3 times
The dataThe data
Questionnaires have detailed Questionnaires have detailed information on:information on:– all components of family incomeall components of family income– household compositionhousehold composition– GCSE resultsGCSE results– mother’s and father’s education, mother’s and father’s education,
occupation and work historyoccupation and work history– early childhood circumstancesearly childhood circumstances– current activities of young peoplecurrent activities of young people
Matching approachMatching approach Involves taking all eligible individuals in the pilot Involves taking all eligible individuals in the pilot
areas and matching them with a weighted sum of areas and matching them with a weighted sum of individuals who look like them in control areas individuals who look like them in control areas
Difference in full-time education outcomes in pilot Difference in full-time education outcomes in pilot and control areas in this matched sample is the and control areas in this matched sample is the estimate of the EMA effect (ATT)estimate of the EMA effect (ATT)
Crucial assumption is that we observe everything Crucial assumption is that we observe everything that determines education participationthat determines education participation
How do we do this?How do we do this? Don’t match on all X’s, but can instead match on the Don’t match on all X’s, but can instead match on the
propensity score (Rosenbaum and Rubin, 1983)propensity score (Rosenbaum and Rubin, 1983)
Propensity score is just predicted probability of being Propensity score is just predicted probability of being in a pilot area given all the observables in our datain a pilot area given all the observables in our data
Use kernel-based matching (Heckman, Ichimura & Use kernel-based matching (Heckman, Ichimura & Todd, 1998)Todd, 1998)
We do this matching for each sub-group of interestWe do this matching for each sub-group of interest
early childhood characteristics, older siblings’ early childhood characteristics, older siblings’ education and parents’ age, education, work education and parents’ age, education, work status and occupationstatus and occupation
Family incomeFamily income– current family income, whether on means-tested current family income, whether on means-tested
benefitsbenefits Ability (GCSE results)Ability (GCSE results) SSchool variableschool variables Indicators of ward level deprivationIndicators of ward level deprivation
Variables we match Variables we match on:on:
Results Y12: urban menResults Y12: urban men
Participation in Pilot Areas
Unmatched
Linear Prob, OLS
Kernel PSM
Fully Interacted OLS
FT Education
66.4 5.3 (2.0)
4.7 (2.0)
4.8 (2.3)
5.0 (2.0)
Work 19.7 -1.5 (1.7)
-2.1 (2.0)
-2.9 (2.0)
-2.5 (2.0)
NEET 13.9 -3.8 (1.5)
-2.7 (1.2)
-1.9 (1.7)
-2.4 (1.4)
Sample size 2,653 2,653 2,647 2,647
Note: Income eligibles only
Results Y12: urban womenResults Y12: urban women
Participation in Pilot Areas
Unmatched
Linear Prob, OLS
Kernel PSM
Fully Interacted OLS
FT Education
71.9 2.5 (1.9)
2.9 (1.7)
4.2 (2.3)
4.0 (1.7)
Work 13.0 0.7 (1.4)
0.4 (1.6)
-0.5 (2.0)
-0.4 (1.4)
NEET 15.1 -3.2 (1.5)
-3.3 (0.9)
-3.7 (1.7)
-3.6 (0.9)
Sample size 2,662 2,662 2,662 2,662
Note: Income eligibles only
Results Y13:Results Y13:
Partic Pilot Areas
Impact Males
Partic Pilot Areas
Impact Females
Ed Y12-Ed Y13 58.7 8.1 (2.8)
63.4
4.4 (2.8)
Ed Y12-OthY13 13.1 -3.1 (2.1)
13.8 -0.9 (2.2)
OthY12-EdY13 1.7 -0.5 (0.9)
2.9 0.8 (0.8)
OthY12-OthY13 26.4 -4.5 (2.6)
19.9 -4.4 (2.3)
Retention rate 81.7 6.1 (3.0)
82.1 2.0 (2.8)
Sample size 1,211 1,295
Note: Income eligibles only
Results by Eligibility Results by Eligibility GroupsGroups
In Year 12 impact concentrated on In Year 12 impact concentrated on those who are fully eligible (6.7-6.9 those who are fully eligible (6.7-6.9 % pts)% pts)– No significant effect for boys or girls on No significant effect for boys or girls on
tapertaper– No effect on ineligiblesNo effect on ineligibles
In Year 13 impact on both groupsIn Year 13 impact on both groups– EMA impacts significantly on retention EMA impacts significantly on retention
for those on the taperfor those on the taper
Does is matter who EMA Does is matter who EMA paid to?paid to?
No difference if we do not distinguish No difference if we do not distinguish by eligibilityby eligibility
For variant where paid to child For variant where paid to child impact is concentrated on those fully impact is concentrated on those fully eligibleeligible
For variant where paid to mother For variant where paid to mother impact on those who are fully and impact on those who are fully and partially eligiblepartially eligible
Credit Constraints?Credit Constraints?
Follow consumption literature (see Zeldes Follow consumption literature (see Zeldes (1989)) split the sample by assets, the idea (1989)) split the sample by assets, the idea being that those with assets are not liquidity being that those with assets are not liquidity constrained. constrained. – Compare results for home-owners and non Compare results for home-owners and non
home-ownershome-owners The key assumption here is that house The key assumption here is that house
ownership in itself does not lead to different ownership in itself does not lead to different responses to financial incentives, other than responses to financial incentives, other than because it implies different access to funds.because it implies different access to funds.
ResultsResults
Significant impact for non home-Significant impact for non home-owners of 9.1 percentage pointsowners of 9.1 percentage points
Insignificant impact of home-owners Insignificant impact of home-owners of 3.8 percentage pointsof 3.8 percentage points
But difference of 5.3 percentage But difference of 5.3 percentage points is not significant at points is not significant at conventional levels (p-value 12%)conventional levels (p-value 12%)
ConclusionsConclusions
EMA effect around 4.5 percentage pointsEMA effect around 4.5 percentage points
Plays a role in reducing gender Plays a role in reducing gender differences in stay-on rates particularly differences in stay-on rates particularly retention in Year 13retention in Year 13
Important to control for local area effectsImportant to control for local area effects– matching on ward level data importantmatching on ward level data important
Other conclusionsOther conclusions More effective paying to child rather More effective paying to child rather
than parent for those fully eligiblethan parent for those fully eligible
More effective paying to mother for More effective paying to mother for those who are partially eligiblethose who are partially eligible
Increase drawn from both work and Increase drawn from both work and NEET groupsNEET groups
Some evidence it may be alleviating Some evidence it may be alleviating credit constraintscredit constraints
What else can you do with What else can you do with Matching?Matching?
What is the policy question you are interested What is the policy question you are interested in?in?
Is ATT the appropriate measure?Is ATT the appropriate measure? In returns to schooling evaluation we are In returns to schooling evaluation we are
much more interested in ATNTmuch more interested in ATNT What is treatment – ITT versus ‘receipt of What is treatment – ITT versus ‘receipt of
treatment’treatment’– Take-up usually an important policy implication Take-up usually an important policy implication
therefore usually inappropriate (& difficult) to therefore usually inappropriate (& difficult) to compare actual participants with an appropriate compare actual participants with an appropriate control group but sometimes no choice!control group but sometimes no choice!