Why Propensity Scores Should Be Used for Matching Ben Jann University of Bern, [email protected] 2017 German Stata Users Group Meeting Berlin, June 23, 2017 Ben Jann (University of Bern) Propensity Scores Matching Berlin, 23.06.2017 1
Why Propensity Scores Should Be Used for Matching
Ben Jann
University of Bern benjannsozunibech
2017 German Stata Users Group MeetingBerlin June 23 2017
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 1
Contents
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 2
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
John Stuart Mill (1806ndash1873)
Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
Treatment variable D
D =
1 treatment (eats of a particular dish)
0 control (does not eat of a particular dish)
Potential outcomes Y 1 and Y 0
I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she
surviveI Y 0 potential outcome without treatment (D = 0)
F If person i would not eat of a particular dish would she die or wouldshe survive
Causal effect of the treatment for individual i
causal effect = difference between potential outcomes
δi = Y 1i minus Y 0
i
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4
Fundamental Problem of Causal Inference
The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1
i minus Y 0i
However the observed outcome variable is
Yi =
Y 1
i if Di = 1
Y 0i if Di = 0
That is only one of the two potential outcomes will be realized andhence only Y 1
i or Y 0i can be observed but never both
Consequence
The individual treatment effect δi cannot be observed
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Contents
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 2
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
John Stuart Mill (1806ndash1873)
Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
Treatment variable D
D =
1 treatment (eats of a particular dish)
0 control (does not eat of a particular dish)
Potential outcomes Y 1 and Y 0
I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she
surviveI Y 0 potential outcome without treatment (D = 0)
F If person i would not eat of a particular dish would she die or wouldshe survive
Causal effect of the treatment for individual i
causal effect = difference between potential outcomes
δi = Y 1i minus Y 0
i
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4
Fundamental Problem of Causal Inference
The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1
i minus Y 0i
However the observed outcome variable is
Yi =
Y 1
i if Di = 1
Y 0i if Di = 0
That is only one of the two potential outcomes will be realized andhence only Y 1
i or Y 0i can be observed but never both
Consequence
The individual treatment effect δi cannot be observed
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
John Stuart Mill (1806ndash1873)
Thus if a person eats of a particulardish and dies in consequence that iswould not have died if he had not eatenof it people would be apt to say thateating of that dish was the cause of hisdeath (Mill 2002[1843]214)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 3
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
Treatment variable D
D =
1 treatment (eats of a particular dish)
0 control (does not eat of a particular dish)
Potential outcomes Y 1 and Y 0
I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she
surviveI Y 0 potential outcome without treatment (D = 0)
F If person i would not eat of a particular dish would she die or wouldshe survive
Causal effect of the treatment for individual i
causal effect = difference between potential outcomes
δi = Y 1i minus Y 0
i
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4
Fundamental Problem of Causal Inference
The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1
i minus Y 0i
However the observed outcome variable is
Yi =
Y 1
i if Di = 1
Y 0i if Di = 0
That is only one of the two potential outcomes will be realized andhence only Y 1
i or Y 0i can be observed but never both
Consequence
The individual treatment effect δi cannot be observed
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Counterfactual Causality (see Neyman 1923 Rubin 1974 1990)
aka Rubin Causal Model aka Potential Outcomes Framework
Treatment variable D
D =
1 treatment (eats of a particular dish)
0 control (does not eat of a particular dish)
Potential outcomes Y 1 and Y 0
I Y 1 potential outcome with treatment (D = 1)F If person i would eat of a particular dish would she die or would she
surviveI Y 0 potential outcome without treatment (D = 0)
F If person i would not eat of a particular dish would she die or wouldshe survive
Causal effect of the treatment for individual i
causal effect = difference between potential outcomes
δi = Y 1i minus Y 0
i
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 4
Fundamental Problem of Causal Inference
The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1
i minus Y 0i
However the observed outcome variable is
Yi =
Y 1
i if Di = 1
Y 0i if Di = 0
That is only one of the two potential outcomes will be realized andhence only Y 1
i or Y 0i can be observed but never both
Consequence
The individual treatment effect δi cannot be observed
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Fundamental Problem of Causal Inference
The causal effect of D on Y for individual i is defined as thedifference in potential outcomes δi = Y 1
i minus Y 0i
However the observed outcome variable is
Yi =
Y 1
i if Di = 1
Y 0i if Di = 0
That is only one of the two potential outcomes will be realized andhence only Y 1
i or Y 0i can be observed but never both
Consequence
The individual treatment effect δi cannot be observed
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 5
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Average Treatment Effect
Although individual causal effects cannot be observed the averagecausal effect in a population (the so-called ldquoAverage TreatmentEffectrdquo) can be identified comparing the expected values of Y 1 andY 0
ATE = E [δ] = E [Y 1 minus Y 0] = E [Y 1]minus E [Y 0]
Some other quantities of interestI Average Treatment Effect on the Treated (ATT)
ATT = E [Y 1 minus Y 0|D = 1] = E [Y 1|D = 1]minus E [Y 0|D = 1]
I Average Treatment Effect on the Untreated (ATC)
ATC = E [Y 1 minus Y 0|D = 0] = E [Y 1|D = 0]minus E [Y 0|D = 0]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 6
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Average Treatment Effect
To determine the average effect unbiased estimates of E [Y 0] andE [Y 1] are required
If the independence assumption
(Y 0Y 1) perpperp D
applies that is if D is independent from Y 0 and Y 1 then
E [Y 0] = E [Y 0|D = 0]
E [Y 1] = E [Y 1|D = 1]
In this case the average causal effect can be be measured by asimple group comparison (mean difference) of observations withouttreatment (D = 0) and observations with treatment (D = 1)
Randomized experiments solve the problem If the assignment of Dis randomized D is independent from Y 0 and Y 1 by design
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 7
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 8
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Conditional Independence Strong Ignorability
Can causal effects also be identified from ldquoobservationalrdquo (ienon-experimental) data
Sometimes it can be argued that the independence assumption isvalid conditionally (conditional independence ldquounconfoundednessrdquo)
(Y 0Y 1) perpperp D |X
If in addition the overlap assumption
0 lt Pr(D = 1|X = x) lt 1 for all x
is given then the ATE (or ATT or ATC) can be identified byconditioning on X
For example
ATE =sumx
Pr[X = x ] E [Y |D = 1X = x ]minus E [Y |D = 0X = x ]
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 9
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching
Matching is one approach to ldquocondition on X rdquo if strong ignorabilityholds
Basic idea1 For each observation in the treatment group find ldquostatistical twinsrdquo in
the control group with the same (or at least very similar) X values(and vice versa)
2 The Y values of these matching observations are then used tocompute the counterfactual outcome for the observation at hand
3 An estimate for the average causal effect can be obtained as themean of the differences between the observed values and theldquoimputedrdquo counterfactual values over all observations
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 10
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching
Formally
ATT =1
ND=1
sumi |D=1
[Yi minus Y 0
i
]=
1ND=1
sumi |D=1
Yi minussum
j |D=0
wijYj
ATC =
1ND=0
sumi |D=0
[Y 1
i minus Yi
]=
1ND=0
sumi |D=0
sumj |D=1
wijYj minus Yi
ATE =
ND=1
Nmiddot ATT +
ND=0
Nmiddot ATC
Different matching algorithms use different definitions of wij
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 11
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Exact Matching
Exact matchingwij =
1ki if Xi = Xj
0 else
with ki as the number of observations for which Xi = Xj applies
The result equivalent to ldquoperfect stratificationrdquo or ldquosubclassificationrdquo(see eg Cochran 1968)
Problem If X contains several variables there is a large probabilitythat no exact matches can be found for many observations (theldquocurse of dimensionalityrdquo)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 12
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Multivariate Distance Matching (MDM)
An alternative is to match based on a distance metric that measuresthe proximity between observations in the multivariate space of X
The idea then is to use observations that are ldquocloserdquo but notnecessarily equal as matches
A common approach is to use
MD(Xi Xj) =radic
(Xi minus Xj)primeΣminus1(Xi minus Xj)
as distance metric where Σ is an appropriate scaling matrix
I Mahalanobis matching Σ is the covariance matrix of X I Euclidean matching Σ is the identity matrixI Mahalanobis matching is equivalent to Euclidean matching based onstandardized and orthogonalized X
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 13
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching Algorithms
Various matching algorithms can be employed to find potentialmatches based on MD and determine the matching weights wij
Pair matching (one-to-one matching without replacement)I For each observation i in the treatment group find observation j inthe control group for which MDij is smallest Once observation j isused as a match do not use it again
Nearest-neighbor matchingI For each observation i in the treatment group find the k closestobservations in the control group A single control can be usedmultiple times as a match In case of ties (multiple controls withidentical MD) use all ties as matches k is set by the researcher
Caliper matchingI Like nearest-neighbor matching but only use controls for which MDis smaller than some threshold c
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 14
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Mahalanobis Matching
Radius matchingI Use all controls as matches for which MD is smaller than somethreshold c
Kernel matchingI Like radius matching but give larger weight to controls for which MDis small (using some kernel function such as eg the Epanechnikovkernel)
In addition since matching is no longer exact it may make sense torefine the estimates by applying regression-adjustment to thematched data (also known as ldquobias-adjustmentrdquo in the context ofnearest-neighbor matching)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 15
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 16
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
The Propensity Score Theorem (Rosenbaum and Rubin 1983)
If the conditional independence assumption is true then
Pr(Di = 1|Y 0i Y
1i Xi) = Pr(Di = 1|Xi) = π(Xi)
where π(X ) is called the propensity score
That is(Y 0Y 1) perpperp D |X
implies(Y 0Y 1) perpperp D |π(X )
so that under strong ignorability the average causal effect can beestimated by conditioning on the propensity score π(X ) instead of X
This is remarkable because the information in X which may includemany variables can be reduced to just one dimension This greatlysimplifies the matching task
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 17
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Propensity Score Matching (PSM)
Instead of computing multivariate distances we can thus simplymatch on the (one-dimensional) propensity score
ProcedureI Step 1 Estimate the propensity score eg using a Logit modelI Step 2 Apply a matching algorithm using differences in thepropensity score |π(Xi )minus π(Xj)| instead of multivariate distances
PSM is tremendously popularI httpsscholargooglechscholarq=propensity+score+AND+(matching+OR+matched+OR+match)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 18
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 19
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
King and Nielsen
In 20152016 Gary King and Richard Nielsen circulated a paper thatcreated quite some concern among applied researchers
The basic message of the paper is that PSM is really really bad andshould be discarded
The paperI httpjmp1sexgVw
SlidesI httpsgkingharvardedupresentationswhy-propensity-scores-should-not-be-used-matching-6
Watch itI httpswwwyoutubecomwatchv=rBv39pK1iEs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 20
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
King and Nielsen
The story goes about as follows
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 21
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
T
CC
C
CC
C
C
CC
C
C
C
C
C
C
C
C
CC C
C
C
C
C
C
C
C
C
C
C
C
CCC
CC
CC
C
C
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Matching to Reduce Model Dependence(Ho Imai King Stuart 2007 fig1 Political Analysis)
Education (years)
Out
com
e
12 14 16 18 20 22 24 26 28
0
2
4
6
8
10
12
T
T
TT T
T
T
T TT
TT
T TT T
T
T
T
TC
C
C
CC
CC
C
C
CC
C CC
C
C
CCCC
C
CC
C
CC
CCCC
C
C
C
C
CC
CCCC
323 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 22
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
King and NielsenArgument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomizationMatching Finding Hidden Randomized Experiments
Types of Experiments
BalanceCovariates
CompleteRandomization
FullyBlocked
Observed On average ExactUnobserved On average On average
Fully blocked dominates complete randomization forimbalance model dependence power eciency bias researchcosts robustness Eg Imai King Nall 2009 SEs 600 smaller
Goal of Each Matching Method (in Observational Data)
bull PSM complete randomization
bull Other methods fully blocked
bull Other matching methods dominate PSM (wait it gets worse)
623
(slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 23
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
CC CCC CCCCC CCCCCC CCC CC C C CCCCC CCC CC CC CC CC CC CC CC CC C CCCCC CC CC C CCCC C CCC CC CC C CCC CC CC CC C CCC CC CC CCC C CC CCCCC CC CCCC C CCC CCC CC C C CCCC CC CCCC C CCCC C CC CCC CC CCC CC CC CC CC C CCCC C CC C C CC CCC CC CCC CCCCC CCCC CCC CC CCC C C CC CC CC CCCC CC CC CCC C C CCC C CC CCC CC C CCC C C CCC CC CC C CC C CC CCCCC CCCC C C CC C CCCC CC CCC CCC C CCC CC CC CC CC CC C CC C CC CC CCC CC C C CCC CCC C CC CC CCC CCC CC CCC CC C CCC CC C CC CCCC C CC C CC CC C CC C CCC C C CCC CC C CCC CCC CC CCCC CC CC C CC C CC CC C CC CCC CC C CCC CCCC C CC CC C CCC CC CC CC C CCCCC CCC C C CC C CC CCC CCC CC CCC CC CCCC C CCC CC CCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Best Case Mahalanobis Distance Matching
Education (years)
Age
12 14 16 18 20 22 24 26 28
20
30
40
50
60
70
80
T TTTT T T TTT TT T T TT TTTTT TTTT TT T TT TTTT TTT TT TTT TTTT TTTT
C CCCC C C CCC CC C C CC CCCCC CCCC CC C CC CCCC CCC CC CCC CCCC CCCC
923 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Best Case Propensity Score Matching
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
C
CCC
CC
C
C
CC
C
C
C
CC
C
C
C
C
C
CCCC
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C
CC
CC
CCC
CC
C
C
C
CC
CC
C
C
CC
C
CC
C
C
C C
CC
C
CC
CC
C
CC
C
C
CCC
C
C
C
CC
CC
C
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
CC
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
C CC
C
C
C
CC
C
C
C
C
C
C
CC
C
C
C
CC
C
C
C
C
C
CCC
C
C
C
CC
CC
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
C
C
C
C
C
C
C C
CC C
CC
CC
C
C
C
C
CC
C
C
C C
CC
CC
C
C
C
C C
CC
C
CC
C
C
C
C C
CC
C
CC
C
C
C
C
C
C
C
C
CC
CC
C
C
CC
CC
C
CC
C
C
C
C
CC
C
CC
C
C
C
CC
C
C
CC
CC
CC
C
C
C C
C
CC C
C
CC
C CC
C
C
C
C
CC
CC
C
C
C
C
CC
C
C
C
C
C
C
C
C
C
C
C
CC
CC C
C
C
C
C
C
C C
C
C
C
C C
C
C
C
CC
CCC
CC
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
C
C
C
C
CCC
CC
C
C
C
C
C
C
C
C
C
C
C
C
CC
C
CC
C
C
CC
C C
C
CC
C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C C
CC C
CCC
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C
CC
C
C C
C
C
C
C
C
C CC
C
C
C
C CC
C
C
C
CC
C
C
C
C
C C
C
C
C
C
C
C
C
C
CC
C
C
CC
C
C
C
C
C C
CC
C
C
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1
0
PropensityScore
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Best Case Propensity Score Matching is Suboptimal
Education (years)
Age
12 16 20 24 28
20
30
40
50
60
70
80
CC
C C
CC
C
C
C CC
C
C
CC
C
C
C
C
C
CC CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
C
C
C C
CC
CCC
CC
T
TTT
TT
T
T
TT
T
T
T
TT
T
T
T
T
T
TT TT
T
T
T
T
T
T
T
TT
T
T
T
T
T
T
T
T
TT
TT
TTT
TT
1523 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 24
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
King and Nielsen
Argument 3I Random pruning (deleting observations at random) increasesimbalance This is because the sample size decreases so that varianceincreases (large differences become more likely)
I More imbalancevariance means more model dependence andresearcher discretion
I Because PSM approximates complete randomization it engages inrandom pruning
I PSM Paradox (ldquowhen you do lsquobetterrsquo you do worserdquo)F When matching is made more strict (eg by decreasing the size of
the caliper) PSM like other matching methods typically reducesimbalance But soon the PSM Paradox kicks in such that furtherpruning quickly increases imbalance
F If the data is such that there are no big differences between treatedand untreated to begin with the PSM Paradox kicks in almostimmediately
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 25
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
PSM Increases Model Dependence amp Bias
Model Dependence0
000
010
020
030
040
05
Number of Units Pruned
Varia
nce
0 40 80 120 160
MDM
PSM
Bias
20
25
30
35
40
Number of Units Pruned
Max
imum
Coe
ffici
ent a
cros
s 51
2 Sp
ecifi
catio
ns0 40 80 120 160
MDM
PSM
True effect = 2
Yi = 2Ti + X1i + X2i + ii N(0 1)
2023 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
The Propensity Score Paradox in Real Data
Finkel et al (JOP 2012)
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
10
Number of units pruned
Imba
lanc
e
CEMMDM
PSM
14 SD caliper
Raw
Random
Nielsen et al (AJPS 2011)
0 500 1000 1500 2000 2500
0
5
10
15
20
25
30
Number of units pruned
Imba
lanc
e
CEM
MDM
PSM
14 SD caliper
RandomRaw
Similar pattern for gt 20 other real data sets we checked
2123 (slid
esby
Kin
gan
dN
ielsen
)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 26
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 27
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 1I Model dependence (ie dependence of results on modeling decisionsmade by the researcher) is bad because it leads to bias (people areselective in their decisions even if they try not to be)
I Matching is good because it reduces model dependence
I fully agree
My view however may be somewhat less pessimistic I believe thatresearch results can be credible if researchers are well educated sothat they know what they are doing and if modeling decisions aremade transparent and robustness of results is evaluated (anddocumented)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 28
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That fully blocked randomization is more efficient than completerandomization ndash given the sample size ndash is of course true (how largethe efficiency gains are depends on the strength of the relationbetween X and Y )
However if blocking reduces the sample size it is not a priori clearwhether estimates from the blocked sample are more efficient thanestimates from the full sample (although often they will be)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 2I PSM approximates complete randomizationI Better are matching approaches that approximate fully blockedrandomization such as Mahalanobis matching because completerandomization is less efficient than fully blocked randomization
That PSM approximates complete randomization is only partiallytrue PSM approximates complete randomization withinobservations with the same propensity score Hence PSM issomewhere between complete randomization and fully blockedrandomizationI If the X variables have no relation to T (treatment) then allobservations have the same propensity score Hence we end up withcomplete randomization
I If the X variables have a strong effect on T there is lots of blocking
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 29
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
That random pruning makes things worse is of course true becauseit unnecessarily reduces the sample size (without changing anythingelse)
As argued above that PSM applies random pruning is only true forX variables unrelated to T (so that we are in a ldquolocalrdquo completerandomization situation although something similar can probablyalso happen if effects from several X rsquos cancel each other out)
Furthermore it is only true if you employ a matching algorithm thatthrows away good matches King and Nielsenrsquos results seem to bebased on the worst possible algorithm one-to-one matching withoutreplacement
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
If you use a matching algorithm that does not throw away goodmatches such as radius or kernel matching (or also nearest-neighbormatching as long as all ties are kept and observations are matchedwith replacement) random pruning can be avoidedI Such algorithms block (and hence prune) where it is necessary toprevent bias but they average where such pruning is not necessary
I Hence efficiency differences between PSM and multivariate matchingshould only be minor for such algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Are King and Nielsen right
Argument 3I Random pruning rArr imbalance rArr more model dependenceI PSM rArr complete randomization rArr lots of random pruningI PSM Paradox ldquowhen you do lsquobetterrsquo you do worserdquo
True is that post-matching modeling can do more harm with PSMthan with MDM (because PSM leaves more ldquofreerdquo variance in X thatcan exploited by modeling decisions)
In general post-matching analyses are more limited for PSM thanfor MDM For example results from subgroup analyses will not bevalid (yoursquod need to apply PSM stratified by subgroups in this case)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 30
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 31
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
The Command
kmatch new matching software for Stata that has been written overthe last few months available from SSC (ssc install kmatch)Some key featuresI Multivariate Distance Matching (MDM) and Propensity ScoreMatching (PSM) (or MDM and PSM combined)
I Optional exact matchingI Optional regression-adjustment bias-correctionI Kernel matching ridge matching or nearest-neighbor matchingI Automatic bandwidth selection for kernelridge matchingI Flexible specification of scaling matrix for MDMI Joint analysis of multiple subgroups and multiple outcome variablesI Various post-estimation commands for balancing andcommon-support diagnostics
I Computationally efficient implementation
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 32
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Use the NLSW data to estimate the effect of union membership on wages controlling for some covariated such as education labor market experience or industry sysuse nlsw88 clear(NLSW 1988 extract)
drop if industry==2(4 observations deleted)
Mahalanobis-distance kernel matching
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 33
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples some balancing statistics kmatch summarize(refitting the model using the generate() option)
Raw Matched(ATT)Means Treated Untrea~d StdDif Treated Untrea~d StdDif
collgrad 321663 224212 219912 319444 319444 0ttl_exp 132685 127323 117584 133205 131425 039036tenure 789205 617658 29735 791744 758347 057888
3industry 006565 012178 -058246 00463 00463 04industry 183807 166905 044425 185185 185185 05industry 105033 027937 312944 085648 085648 06industry 045952 169771 -407129 048611 048611 07industry 019694 102436 -350657 020833 020833 08industry 017505 035817 -113785 009259 009259 09industry 010941 040115 -185669 011574 011574 0
10industry 004376 008596 -052551 002315 002315 011industry 479212 356734 250073 506944 506944 012industry 122538 07235 169707 12037 12037 0
2race 330416 244986 189418 3125 3125 03race 017505 011461 050566 006944 006944 0south 297593 466332 -352408 291667 291667 0
Raw Matched(ATT)Variances Treated Untrea~d Ratio Treated Untrea~d Ratio
collgrad 218674 174066 125628 217904 217904 1ttl_exp 205898 210001 980459 198177 182323 108696tenure 372044 293629 126706 370399 349543 105966
3industry 006536 012038 542928 004619 004619 14industry 150351 139148 108052 151242 151242 15industry 094207 027176 346656 078494 078494 16industry 043936 14105 311496 046355 046355 17industry 019348 092008 210287 020447 020447 18industry 017237 034559 498769 009195 009195 19industry 010845 038533 281445 011467 011467 1
10industry 004367 008528 512039 002315 002315 111industry 250115 229639 108917 250532 250532 112industry 107758 067163 160443 106127 106127 1
2race 221726 1851 119787 215342 215342 13race 017237 011338 152025 006912 006912 1south 20949 249045 841173 207077 207077 1
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 34
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples make a graph of the balancing stats mat M = r(M)
mat V = r(V)
coefplot matrix(M[3]) matrix(M[6]) || matrix(V[3]) matrix(V[6]) || gt bylabels(Std mean difference Variance ratio) gt noci nolabels byopts(xrescale)
addplot 1 xline(0) norescaling legend(order(1 Raw 2 Matched))
addplot 2 xline(1) norescaling
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-4 -2 0 2 4 0 1 2 3 4
Std mean difference Variance ratio
Raw Matched
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 35
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Propensity-score kernel matching
kmatch ps union collgrad ttl_exp tenure iindustry irace south gt (wage) nate att(computing bandwidth done)
Propensity-score kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Covariates collgrad ttl_exp tenure iindustry irace southPS model logit (pr)
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 431 26 457 1214 182 1396 00188
Treatment-effects estimation
wage Coef
ATT 3887224NATE 1432913
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 36
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Kernel density balancing plot kmatch density lw(6 2) lc(5 1)(refitting the model using the generate() option)(applying 0-1 boundary correction to density estimation of propensity score)(bandwidth for propensity score = 06803989)
01
23
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Den
sity
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 37
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Cumulative distribution balancing plot kmatch cumul lw(6 2) lc(5 1)(refitting the model using the generate() option)
05
1
0 2 4 6 8 0 2 4 6 8
Raw Matched (ATT)
Untreated Treated
Cum
ulat
ive
prob
abilit
y
Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 38
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Balancing box plot kmatch box(refitting the model using the generate() option)
02
46
8Raw Matched (ATT)
Untreated Treated
Prop
ensi
ty s
core
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 39
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Standard errors kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage) nate ate att atc vce(bootstrap)(computing bandwidth for treated done)(computing bandwidth for untreated done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394Untreated 1386 10 1396 455 2 457 33975Combined 1818 35 1853 1560 293 1853
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATE 4095729 1920853 213 0033 0330928 7860531ATT 6059013 2472069 245 0014 1213846 1090418ATC 3483797 1893653 184 0066 -0227695 7195289
NATE 1432913 2333282 614 0000 9755981 1890228
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 40
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Do some tests
lincom ATT-NATE
( 1) ATT - NATE = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) -8270117 1810415 -457 0000 -1181847 -4721768
test ATT = ATC
( 1) ATT - ATC = 0
chi2( 1) = 242Prob gt chi2 = 01200
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 41
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Nearest-neighbor matching (1 neighbor) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 1
Treatment union = 1 max = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 328 1068 1396
Treatment-effects estimation
wage Coef
ATT 7246969
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 1Outcome model matching min = 1Distance metric Mahalanobis max = 1
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 7246969 2942952 246 0014 147889 1301505
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 42
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Nearest-neighbor matching (5 neighbors) kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5590823
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) (union) atet nn(5)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5590823 2381752 235 0019 0922675 1025897
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 43
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Bias-correction regression adjustment kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure iindustry irace south) att nn(5)
Multivariate-distance nearest-neighbor matching
Number of obs = 1853Neighbors min = 5
Treatment union = 1 max = 5Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 457 0 457 870 526 1396
Treatment-effects estimation
wage Coef
ATT 5288023
adjusted for collgrad ttl_exp tenure iindustry irace south
teffects nnmatch (wage collgrad ttl_exp tenure iindustry irace south) gt (union) atet nn(5) biasadj(collgrad ttl_exp tenure iindustry irace south)
Treatment-effects estimation Number of obs = 1853Estimator nearest-neighbor matching Matches requested = 5Outcome model matching min = 5Distance metric Mahalanobis max = 6
AI Robustwage Coef Std Err z Pgt|z| [95 Conf Interval]
ATETunion
(union vs nonunion) 5288023 2420635 218 0029 0543666 1003238
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 44
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Mahalanobis-distance and propensity-score matching combined
kmatch md union collgrad ttl_exp tenure (wage) att gt psvars(iindustry irace south) psweight(3)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobis (modified)Covariates collgrad ttl_exp tenurePS model logit (pr)PS covars iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 439 18 457 1258 138 1396 83886
Treatment-effects estimation
wage Coef
ATT 6408443
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 45
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Exact matching
kmatch md union collgrad ttl_exp tenure (wage) att ematch(industry race south)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenureExact industry race south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1103 293 1396 13013
Treatment-effects estimation
wage Coef
ATT 6047374
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 46
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Bandwidth selection the default (based on distribution of distances in
one-nearest-neighbor matching)
kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1105 291 1396 13394
Treatment-effects estimation
wage Coef
ATT 6059013
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 47
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Bandwidth selection cross validation with respect to X kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 448 9 457 1184 212 1396 18888
Treatment-effects estimation
wage Coef
ATT 6651578
kmatch cvplot ms(o) index mlabposition(1) sort
1
57 915131411128 102 6
4
3
02
04
06
08
1MSE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 48
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Bandwidth selection cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 453 4 457 1289 107 1396 2433
Treatment-effects estimation
wage Coef
ATT 6928956
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
579 61110131512
148
4
3
118
12122
124
126
MISE
15 2 25 3Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 49
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Bandwidth selection weighted cross validation with respect to Y kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(cv wage weighted)(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 455 2 457 1356 40 1396 27626
Treatment-effects estimation
wage Coef
ATT 7308166
kmatch cvplot ms(o) index mlabposition(1) sort
1
2
6
10121481513119
3
7
5
4
1112
1314
Wei
ghte
d M
ISE
1 2 3 4 5Bandwidth
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 50
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Common-support diagnostics kmatch md union collgrad ttl_exp tenure iindustry irace south (wage) gt att bwidth(05)
Multivariate-distance kernel matching Number of obs = 1853Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 366 91 457 701 695 1396 5
Treatment-effects estimation
wage Coef
ATT 3303161
kmatch csummarize(refitting the model using the generate() option)
Common support (treated) Standardized differenceMeans Matched Unmatc~d Total (1)-(3) (2)-(3) (1)-(2)
collgrad 322404 318681 321663 001585 -006376 007962ttl_exp 133929 127682 132685 027413 -110253 137666tenure 812614 695055 789205 038378 -154356 192734
3industry 002732 021978 006565 -047404 190657 -2380614industry 191257 153846 183807 019212 -077269 0964815industry 062842 274725 105033 -137462 552867 -6903296industry 057377 0 045952 054507 -219225 2737327industry 019126 021978 019694 -004083 016423 -0205068industry 005464 065934 017505 -091714 368871 -4605859industry 010929 010989 010941 -000115 000462 -000577
10industry 0 021978 004376 -066227 266363 -33258911industry 554645 175824 479212 15083 -606636 75746712industry 092896 241758 122538 -090299 363181 -45348
2race 243169 681319 330416 -185284 745209 -9304943race 002732 076923 017505 -112525 452572 -565097south 29235 318681 297593 -011456 046074 -05753
Common support (treated) RatioVariances Matched Unmatc~d Total (1)(3) (2)(3) (1)(2)
collgrad 219058 219536 218674 100176 100394 997824ttl_exp 194198 252474 205898 943177 122621 76918tenure 383324 319242 372044 103032 858076 120073
3industry 002732 021734 006536 418045 332537 1257144industry 155101 131624 150351 103159 875443 1178375industry 059054 201465 094207 626851 213854 2931226industry 054233 0 043936 123435 0 7industry 018811 021734 019348 972252 11233 8655318industry 00545 062271 017237 316157 361269 0875139industry 010839 010989 010845 999464 101328 986361
10industry 0 021734 004367 0 497709 011industry 247691 14652 250115 990307 585811 16904912industry 084497 185348 107758 784137 172003 455885
2race 184542 219536 221726 832297 990121 8406013race 002732 071795 017237 158513 416522 038056south 207448 219536 20949 990254 104796 944939
(1) matched (2) unmatched (3) total
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 51
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples make a graph of the common-support stats mat M = r(M)
coefplot matrix(M[4]) title(Std difference) noci nolabels xline(0)
collgradttl_exptenure
3industry4industry5industry6industry7industry8industry9industry
10industry11industry12industry
2race3racesouth
-2 -1 0 1 2
Std difference
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 52
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples
Multiple outcome variables
kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage hours) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 6021049
NATE 1430823
hoursATT 1263759
NATE 1450303
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 53
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Multiple outcome variables with different regression-adjustment equations kmatch md union collgrad ttl_exp tenure iindustry irace south gt (wage = collgrad ttl_exp tenure) gt (hours = iindustry irace) nate att(computing bandwidth done)
Multivariate-distance kernel matching Number of obs = 1852Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace south
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
Treated 432 25 457 1104 291 1395 13392
Treatment-effects estimation
Coef
wageATT 5152752
NATE 1430823
hoursATT 1263759
NATE 1450303
wage adjusted for collgrad ttl_exp tenurehours adjusted for iindustry irace
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 54
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Some Examples Treatment effects by subpopulation kmatch md union collgrad ttl_exp tenure iindustry irace (wage) gt att vce(boot) over(south)(south=0 computing bandwidth done)(south=1 computing bandwidth done)
(running kmatch on estimation sample)
Bootstrap replications (50)1 2 3 4 5
50
Multivariate-distance kernel matching Number of obs = 1853Replications = 50Kernel = epan
Treatment union = 1Metric mahalanobisCovariates collgrad ttl_exp tenure iindustry irace
0 south = 01 south = 1
Matching statistics
Matched Controls Band-Yes No Total Used Unused Total width
0Treated 306 15 321 625 120 745 13199
1Treated 126 10 136 473 178 651 13398
Treatment-effects estimation
Observed Bootstrap Normal-basedwage Coef Std Err z Pgt|z| [95 Conf Interval]
0ATT 4586332 2763358 166 0097 -082975 1000241
1ATT 9518705 406903 234 0019 1543553 1749386
test [0]ATT = [1]ATT
( 1) [0]ATT - [1]ATT = 0
chi2( 1) = 123Prob gt chi2 = 02679
lincom [1]ATT - [0]ATT
( 1) - [0]ATT + [1]ATT = 0
wage Coef Std Err z Pgt|z| [95 Conf Interval]
(1) 4932373 4452343 111 0268 -379406 1365881
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 55
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Simulation
Population data from Swiss census of 2000
Outcome Treiman occupational prestige (recoded from ISCO codesof the current job using command iskotrei by Hendrickx 2002)(values from 6 to 78 mean 44)
Estimand ATT of nationality on occupational prestige withresident aliens as the treatment group and Swiss nationals as thecontrol group
Control variables gender age and highest educational degree
Population restricted to people between 24 to 60 years old who areworking
2rsquo308rsquo006 individuals of which 175 belong to the treatmentgroup
Draw random samples (N = 500 1000 or 5000) from populationand compute various matching estimators
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 56
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
SimulationSubstantial differences between resident aliens and Swiss nationalson all three covariates
Propensity score in population (computed from fully stratified data)
0
1
2
3
4
5
6
7
Den
sity
0 1 2 3 4 5 6 7 8 9 1Propensity score
UntreatedTreated
McFadden R2 = 0121Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 57
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Simulation
Raw mean difference in occupational prestige (NATE) minus479Population ATT (computed from fully stratified data) minus396There is some treatment effect heterogeneity (ATE = minus351 ATC= minus341)
30
35
40
45
50
55
Out
com
e
0 1 2 3 4 5 6 7 8Propensity score
UntreatedTreated
-6
-5
-4
-3
-2
-1
Trea
tmen
t effe
ct
0 1 2 3 4 5 6 7 8Propensity score
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 58
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 59
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Variance
2017-06-24
Propensity Scores MatchingIllustration using kmatch
In this slide we can see that for the same algorithm PSM typically issomewhat less efficient than MDM but that across algorithms PSMcan also be much more efficient than MDM For example kernelmatching PSM has a much smaller variance than 1-nearest-neighborMDM That is the choice of algorithm matters much more than thechoice between PSM and MDM
For kernel matching the efficiency differences between PSM and MDMare only small additional post-matching regression adjustment furtherreduces the differences
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 60
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
70 80 90 100 110 120 130 95 100 105 110 115 120 125 130 135
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Bias reduction (in percent)
2017-06-24
Propensity Scores MatchingIllustration using kmatch
Here we see that PSM has a bias that does not vanish as the samplesize increases The reason is that the same propensity-score modelspecification is used for both sample sizes The model is rather simple(linear effect of age no interactions) and due to the specific pattern ofthe data (in particular the sharp drop in the outcome variable afterpropensity score 03) small imprecisions can have substantial effects onthe results In practice one would probably use a more refinedspecification in the large-sample situation which would reduce bias
The bias also vanishes once post-matching regression adjustment isapplied
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to X
cross-validationwith respect to Y
weighted CVwith respect to Y
Nearest-neighbormatching
Kernel matching
15 2 25 3 35 4 45 15 2 25 3 35 4 45 5
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Mean squared error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 61
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
9 95 1 105 11 115 12 95 1 105 11 115 12 125
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Relative standard error
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 62
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 neighbor
5 neighbors
1 neighbor
5 neighbors
fixed bandwidth
pair-matchingbandwidth
cross-validationwith respect to Xcross-validation
with respect to Yweighted CV
with respect to Y
Nearest-neighbormatching (teffects)
Nearest-neighbormatching (bootstrap)
Kernel matching(bootstrap)
92 93 94 95 96 97 98 9 92 94 96 98
N = 500 N = 5000
MDMwith biascorrection
PSMwith biascorrection
Results Coverage of 95 CIs
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 63
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
1 Potential Outcomes and Causal Inference
2 Matching
3 Propensity Score Matching
4 King and Nielsenrsquos ldquoWhy Propensity Scores Should Not Be Used forMatchingrdquo
5 Are King and Nielsen right
6 Illustration using kmatch
7 Conclusions
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 64
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Conclusions
The arguments brought forward by King and Nielsen againstPropensity Score Matching are valid but they mostly apply to onespecific form of PSM pair matching (one-to-one matching withoutreplacement)
Other PSM matching algorithms perform much better because theyare less affected by the random pruning problem
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 65
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Conclusions
Overall I agree that MDM has advantages over PSM but it alsohas some disadvantages In applied research the choice may not bethat clear- MDM leaves less scope for post-matching modeling decision biases- Theoretical results (see eg Froumllich 2007) suggest that MDM will
generally tend to outperform PSM in terms of efficiency (butdifferences are likely to be small)
- Less restrictions in terms of possible post-matching analyses Choice of scaling matrix largely arbitrary various suggestions in the
literature (somewhat unclear eg how categorical variables should betreated)
Computational complexity
One clear conclusion we can draw however is
Do not use propensity scores for pair matching(But donrsquot use pair matching anyhow)
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 66
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
Conclusions
Some conclusions from the simulationI For PSM application of regression-adjustment seems like a great idea(reduction of bias and variance) for MDM the advantages ofregression-adjustment are less clear
I Bootstrap standard errorconfidence interval estimation seems to bemostly ok for kernelridge matching this is in contrast tonearest-neighbor matching where bootstrap standard errors areclearly biased
To doI Run some simulations comparable to the ones by King and Nielsenusing various matching algorithms
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 67
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
References I
Cochran WG 1968 The Effectiveness of Adjustment by Subclassificationin Removing Bias in Observational Studies Biometrics 24(2)295ndash313
Froumllich M 2004 Finite-sample properties of propensity-score matchingand weighting estimators The Review of Economics and Statistics86(1)77ndash90
Froumllich M 2007 On the inefficiency of propensity score matching AStA91279ndash290
Hendrickx J 2002 ISKO Stata module to recode 4 digit ISCO-88occupational codes Statistical Software Components S425802 BostonCollege Department of Economics
King G R Nielsen 2016 Why Propensity Scores Should Not Be Usedfor Matching Working Paper Available from httpjmp1sexgVw
Mill JS 2002 A System of Logic Reprinted from the 1981 edition (firstpublished 1843) Honolulu Hawaii University Press of the Pacific
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 68
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69
References II
Neyman J 1990[1923] On the Application of Probability Theory toAgricultural Experiments Essay on Principles Section 9 (Translated andedited by DM Dabrowska and TP Speed from the Polish original)Statistical Science 5(4)465ndash472
Rosenbaum PR DB Rubin 1983 The Central Role of the PropensityScore in Observational Studies for Causal Effects Biometrika 7041ndash55
Rubin DB 1974 Estimating causal effects of treatments in randomizedand nonrandomized studies Journal of Educational Psychology66(5)688ndash701
Rubin DB 1990 Comment Neyman (1923) and Causal Inference inExperiments and Observational Studies Statistical Science 5(4)472ndash480
Ben Jann (University of Bern) Propensity Scores Matching Berlin 23062017 69