Difference-in-Difference Development Workshop. Typical problem in proving causal effects Using differences to estimate causal effects in experimental.

Difference-in-Difference

Development

Workshop

Typical problem in proving causal effects

Using differences to estimate causal effects in experimental data (treatment+control groups)

Wish: ‘treatment’ and ‘control’ group can be assumed to be similar in every way except receipt of treatment

This may be very difficult to do

A Weaker Assumption is..

In absence of treatment, difference between ‘treatment’ and ‘control’ group is constant over time

With this assumption can use observations on treatment and control group pre- and post-treatment to estimate causal effect

Idea– Difference pre-treatment is ‘normal’ difference– Difference post-treatment is ‘normal’ difference + causal effect– Difference-in-difference is causal effect

Graphically…

y

Time

Treatment

Control

Pre- Post-

A

B

C

What is D-in-D estimate?

Standard differences estimator is AB But ‘normal’ difference estimated as CB Hence D-in-D estimate is AC Note: assumes trends in outcome variables the same

for treatment and control groups This is not testable Two periods (before and after) crucial

The Grand Experiment (Snow)

Water supplied to households by competing private companies

Sometimes different companies supplied households in same street

In south London two main companies:– Lambeth Company (water supply from Thames Ditton, 22

miles upstream)– Southwark and Vauxhall Company (water supply from

Thames)

In 1853/54 cholera outbreak

Death Rates per 10000 people by water company– Lambeth 10– Southwark and Vauxhall 150

Might be water but perhaps other factors Snow compared death rates in 1849 epidemic

– Lambeth 150– Southwark and Vauxhall 125

In 1852 Lambeth Company had changed supply from Hungerford Bridge

What would be good estimate of effect of clean water?

1849 1853/54 Difference

Lambeth 150 10 -140

Vauxhall and Southwark

125 150 25

Difference -25 140 -165

Card and Krueger (1994)

Basic microeconomic theory of the firm: factor demand curves slope downwards.

Hence, if minimum wages are binding, we would expect employment to fall if minimum wage is raised.

Natural experiment: New Jersey raising its minimum wage from $4.25 to $5.05 on 1 April 1992 while the minimum wage in neighbouring Pennsylvania remained unchanged.

Data: wages and employment in 65 fast-food restaurants in Pennsylvania and 284 in New Jersey in Feb/March 1992 (i.e. before the rise in the NJ minimum wage) and in Nov/Dec 1992 (i.e. after the rise).

Difference-in-difference design to investigate the impact of minimum wages on employment.

What data we have?

698 observations– Sheet: an identifier for each restaurant (each has

two observations, pre- and post-)– NJ: dummy for whether a NJ restaurant– After: dummy for whether post- observation– Njafter: nj*after– Fte: full-time equivalent employment– Dfte: change in full-time equivalent employment

Tabulate command

Tabulate in STATA:– tabulate var (or tab var) – just a simple table– tab var, g(newvar) – generating a new variable– tab var, su(othervar) – summarising some other

variable

Let’s get our first DinD estimator

tabulate nj after, su(fte) means

Before After Diff

PA 20.3 18.3 -2.0

NJ 17.3 17.5 +0.2

Diff +3.0 +0.8 ??

Going from means to statistics

reg dfte nj

_cons -2.046154 1.062988 -1.92 0.055 -4.136864 .0445564 nj 2.328724 1.178371 1.98 0.049 .0110768 4.646372 dfte Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 25772.7145 348 74.0595245 Root MSE = 8.5701 Adj R-squared = 0.0083 Residual 25485.8728 347 73.4463192 R-squared = 0.0111 Model 286.841779 1 286.841779 Prob > F = 0.0489 F( 1, 347) = 3.91 Source SS df MS Number of obs = 349

… and with robust standard errors

Coeff SE P-value

OLS 2.329 1.17 0.049

Robust OLS 2.329 1.47 0.114

reg dfte nj reg dfte nj, robust

An alternative specification …

reg fte nj after njafter, robust

_cons 20.3 1.501537 13.52 0.000 17.3519 23.2481 njafter 2.328724 1.930761 1.21 0.228 -1.46211 6.119558 after -2.046154 1.788875 -1.14 0.253 -5.55841 1.466103 nj -2.998944 1.591452 -1.88 0.060 -6.123581 .1256939 fte Coef. Std. Err. t P>|t| [95% Conf. Interval] Robust

Root MSE = 8.9641 R-squared = 0.0089 Prob > F = 0.2682 F( 3, 694) = 1.32Linear regression Number of obs = 698

Alternative specifications…

reg fte nj after njafter, cl(sheet) xtreg fte nj after njafter, fe i(sheet)

Any key differences? Should there be any?

Suppose we’d like to observe many estimations

STATA commands for results-sets Guy named Roger Newson

– estimates store– outreg (works mostly with regressions)– parmest/parmby

Summary

A very useful and widespread approach Validity does depend on assumption that

trends would have been the same in absence of treatment

Can use other periods to see if this assumption is plausible or not

Uses 2 observations on same individual – most rudimentary form of panel data

Difference-in-Difference Development Workshop. Typical problem in proving causal effects Using differences to estimate causal effects in experimental.

Documents