Natural experiments: the basics Class Data Mining Technology for Business and Society Program M. Sc. Data Science University Sapienza University of Rome Semester Spring 2016 Lecturer Carlos Castillo http://chato.cl/ Sources: ● Thad Dunning: Natural Experiments in the Social Sciences. Cambridge University Press, 2012 [ link].
45
Embed
Natural experiments: the basics - chato.cl · Natural experiments Thad Dunning: Natural Experiments in the Social Sciences. Cambridge University Press, 2012 [link].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Natural experiments:the basics
Class Data Mining Technology for Business and SocietyProgram M. Sc. Data ScienceUniversity Sapienza University of RomeSemester Spring 2016Lecturer Carlos Castillo http://chato.cl/
Sources:● Thad Dunning: Natural Experiments in the Social Sciences.
Yulia Tyshchuk, Cindy Hui, Martha Grabowski, William A. Wallace: “Social Media and Warning Response Impacts in Extreme Events: Results from a Naturally Occurring Experiment” HICSS 2012● “On April 6th, 2010, at 8:15 a.m., an armed perpetrator
robbed Regina Check Cashing Corporation, located at 450 Hoosick Street in Troy, N.Y., which is about one mile away from the Rensselaer Polytechnic Institute (RPI) campus. Later on, the perpetrator was seen on campus, specifically, in the East Campus Athletic Village. The RPIAlert system was activated and the first ‘stay in shelter’ warning, via on campus loudspeakers, emails, phone calls, voice mails and text messages, was issued at 9:30 a.m. Two more ‘stay in shelter’ warnings were issued at 10:48 a.m. and 11:48 a.m. that day, before the ‘all clear’ message was issued at 12:52 p.m.”
● Paper describes Twitter's network evolution, keywords, etc. after the event
Natural experiments
Thad Dunning: Natural Experiments in the Social Sciences. Cambridge University Press, 2012 [link].
● Start with a population (= “study group”)● Separate control and treatment groups at
random● Apply treatment to treatment group● Measure outcomes in both groups● Compare outcomes● Profit!
Key elements of
randomized controlled experiments
1.Randomized: assignment of subjects to treatment/control groups is done at random
2.Controlled: response of subjects assigned to treatment is compared to response of subjects assigned to control
3.Experiment: treatment received by treatment group is under the control of a researcher
Why natural experiments?
● In some contexts, direct manipulation is– Expensive
– Impractical
– Unethical
● Most results from social sciences and computational social science in large populations are observational
Example: John Snow's cholera research (1855)
Do not confuse John Snow with Jon Snow.
Red = cholera death
Blue = water pump
Prevalent hypothesis: cholera was caused by miasma in air.
● Two companies– (1) Southwark & Vauxhall (2) Lambeth
● In 1852, Lambeth moved their intake pipes upstream the Thames river, before city sewage, but Shouthwark and Vauxhall did not
Snow's observations (1853-1854)
Snow's words
Comparison
Randomized controlled experiment
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to groups is done using a randomization device
3.Treatment is under the control of a researcher
Natural experiment
1.Response of subjects assigned to treatment compared to response of subjects assigned to control
2.Assignment of subjects to groups is as-if random, or as good as random
3.Treatment was not under the control of a researcher
Types of natural experiment
1.Standard natural experiment
2.Instrumental variables
3.Regression discontinuity
“Perfect” natural experiment
● Doherty, Gerber, Green 2006– Compare political attitudes of lottery winners vs
lottery non-winners
– Found that “lottery-induced affluence increases hostility toward estate taxes, marginally increases hostility towards government redistribution, but has little effect on broader attitudes concerning economic stratification or the role of government as a provider of social insurance”
Weakness: study group = lottery players. Are they representative of the whole society?
● After: difference in differences of 0.66 posts: 75% more.
Example: economic output and war[Miguel, Satyanath, Sergenti 2004]
● Economic shocks push countries to war?– Big methodological problem is reciprocal causation:
poverty creates the conditions for war, which creates more poverty
● External variable for economic output: rainfall● Study across countries with high/low relative rainfall● Low rainfall in one year increases chance of war
● Setting: we want to study whether going to the Vietnam war affected the future income of people in the US– E.g. lost experience or years of career caused drop in
future salary, trauma of war caused decline in productivity, etc.
– Very important to apportion stipends, pensions, etc.
● Instrumental variable: eligibility for military draft– Date of birth yields a number from 1 to 365
– All whose number is larger than X are drafted
– Not all drafted go to war, not all that go to war were drafted
Example: military and future income
● Study group: white men of military age in 1971
● This is called intention-to-treat analysis because in this case we use the intention to send people to war for dividing the study group
● Why? Because going to war is not a random process, while day of birth is as-if random
● More on this later ...
Instrumental variable 1984 earnings adjusted by inflation
Eligible by day of birth $16.172
Not eligible by day of birth $15.813 (about 2.2% less)
Regression discontinuity design
● A scalar variable is used to decide who receives treatment and who does not
● An arbitrary threshold is used● Observe outcomes just below and just above
the threshold
Regression discontinuity designs
Students with a score above 11 are given a “Certificate of Merit”
Hypothesis: getting a certificate of merit increases chances of scholarship
Regression discontinuity designs: three hypothetical series (A, B, C)
Electronic voting in Brazil[Fujiwara 2015]
● Literally hundreds of candidates per ballot● Municipalities with 40,500 voters or more
→ electronic ballot
● Municipalities with less than 40,500 voters
→ paper ballot
● Create two bands and compare, using h=20k
[40500-h, 40500) [40500, 40500+h)
● Jump from 75% to 90% valid votes, particularly in municipalities with lower literacy
Band size: if it's too narrow we question sample size, if it's too wide we question randomness