Top Banner
Understanding and misunderstanding randomized controlled trials Angus Deaton and Nancy Cartwright Princeton University Durham University and UC San Diego This version, August 2016 We acknowledge helpful discussions with many people over the many years this paper has been in preparation. We would particularly like to note comments from seminar participants at Princeton, Columbia and Chicago, the CHESS research group at Durham, as well as discussions with Orley Ashenfelter, Anne Case, Nick Cowen, Hank Farber, Bo Honoré, and Julian Reiss. Ulrich Mueller had a major influence on shaping Section 1 of the paper. We have benefited from gen- erous comments on an earlier version by Tim Besley, Chris Blattman, Sylvain Chassang, Steven Durlauf, Jean Drèze, William Easterly, Jonathan Fuller, Lars Hansen, Jim Heckman, Jeff Hammer, Macartan Humphreys, Helen Milner, Suresh Naidu, Lant Pritchett, Dani Rodrik, Burt Singer, Richard Zeckhauser, and Steve Ziliak. Cartwright’s research for this paper has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U). Deaton acknowledges financial support through the National Bureau of Economic Research, Grants 5R01AG040629-02 and P01 AG05842-14 and through Princeton University’s Roybal Center, Grant P30 AG024928.
69

Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

Mar 06, 2018

Download

Documents

trandieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

Understandingandmisunderstandingrandomizedcontrolledtrials

AngusDeatonandNancyCartwright

PrincetonUniversity

DurhamUniversityandUCSanDiego

Thisversion,August2016

Weacknowledgehelpfuldiscussionswithmanypeopleoverthemanyyearsthispaperhasbeeninpreparation.WewouldparticularlyliketonotecommentsfromseminarparticipantsatPrinceton,ColumbiaandChicago,theCHESSresearchgroupatDurham,aswellasdiscussionswithOrleyAshenfelter,AnneCase,NickCowen,HankFarber,BoHonoré,andJulianReiss.UlrichMuellerhadamajorinfluenceonshapingSection1ofthepaper.Wehavebenefitedfromgen-erouscommentsonanearlierversionbyTimBesley,ChrisBlattman,SylvainChassang,StevenDurlauf,JeanDrèze,WilliamEasterly,JonathanFuller,LarsHansen,JimHeckman,JeffHammer,MacartanHumphreys,HelenMilner,SureshNaidu,LantPritchett,DaniRodrik,BurtSinger,RichardZeckhauser,andSteveZiliak.Cartwright’sresearchforthispaperhasreceivedfundingfromtheEuropeanResearchCouncil(ERC)undertheEuropeanUnion’sHorizon2020researchandinnovationprogram(grantagreementNo667526K4U).DeatonacknowledgesfinancialsupportthroughtheNationalBureauofEconomicResearch,Grants5R01AG040629-02andP01AG05842-14andthroughPrincetonUniversity’sRoybalCenter,GrantP30AG024928.

Page 2: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

1

ABSTRACTRCTsarevaluabletoolswhoseuseisspreadingineconomicsandinothersocialsciences.Theyareseenasdesirableaidsinscientificdiscoveryandforgeneratingevidenceforpoli-cy.YetsomeoftheenthusiasmforRCTsappearstobebasedonmisunderstandings:thatrandomizationprovidesafairtestbyequalizingeverythingbutthetreatmentandsoallowsapreciseestimateofthetreatmentalone;thatrandomizationisrequiredtosolveselectionproblems;thatlackofblindingdoeslittletocompromiseinference;andthatstatisticalin-ferenceinRCTsisstraightforward,becauseitrequiresonlythecomparisonoftwomeans.Noneofthesestatementsistrue.RCTsdoindeedrequireminimalassumptionsandcanop-eratewithlittlepriorknowledge,anadvantagewhenpersuadingdistrustfulaudiences,butacrucialdisadvantageforcumulativescientificprogress,whererandomizationaddsnoiseandunderminesprecision.ThelackofconnectionbetweenRCTsandotherscientificknowledgemakesithardtousethemoutsideoftheexactcontextinwhichtheyarecon-ducted.Yet,oncetheyareseenaspartofacumulativeprogram,theycanplayaroleinbuildinggeneralknowledgeandusefulpredictions,providedtheyarecombinedwithothermethods,includingconceptualandtheoreticaldevelopment,todiscovernot“whatworks,”butwhythingswork.Unlesswearepreparedtomakeassumptions,andtostandonwhatweknow,makingstatementsthatwillbeincredibletosome,allthecredibilityofRCTsisfornaught.

Page 3: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

2

IntroductionRandomizedtrialsarecurrentlymuchusedineconomicsandarewidelyconsideredtobeade-

sirablemethodofempiricalanalysisanddiscovery.Thereisalonghistoryofsuchtrialsinthe

subject.Therewerefourlargefederallysponsorednegativeincometaxtrialsinthe1960sand

1970s.Inthemid-1970s,therewasafamous,andstillfrequentlycited,trialonhealthinsurance,

theRandhealthexperiment.Therewasthenaperiodduringwhichrandomizedcontrolledtrials

(RCTs)receivedlessattentionbyacademiceconomics;evenso,randomizedtrialsonwelfare,

socialpolicy,labormarkets,andeducationhavecontinuedsincethemid-1970s,somewithsub-

stantialinvolvementanddiscussionbyacademiceconomists,seeGreenbergandShroder

(2004).

Recentrandomizedtrialsineconomicdevelopmenthaveattractedattention,andthe

ideathatsuchtrialscandiscover“whatworks”hasbeenwidelyadoptedineconomics,aswell

asinpoliticalscience,education,andsocialpolicy.Amongbothresearchersandthegeneral

public,RCTsareperceivedtoyieldcausalinferencesandparameterestimatesthataremore

crediblethanotherempiricalmethodsthatdonotinvolvethecomparisonofrandomlyselected

treatmentandcontrolgroups.RCTsareseenaslargelyexemptfrommanyoftheeconometric

problemsthatcharacterizeobservationalstudies.WhenRCTsarenotfeasible,researchersoften

mimicrandomizeddesignsbyusingobservationaldatatoconstructtwogroupsthat,asfaras

possible,areidenticalanddifferonlyintheirexposuretotreatment.

Thepreferenceforrandomizedtrialshasspreadbeyondtrialiststothegeneralpublic

andthemedia,whichtypicallyreportsfavorablyonthem.Theyareseenasaccurate,objective,

andlargelyindependentof“expert”knowledgethatisoftenregardedasmanipulable,politically

biased,orotherwisesuspect.Therearenow“WhatWorks”centersusingandrecommending

RCTsinahugerangeofareasofsocialconcernacrossEuropeandtheAnglophoneworld,such

astheUSDepartmentofEducation’sWhatWorksClearingHouse,TheCampbellCollaboration

(paralleltotheCochraneCollaborationinhealth),theScottishIntercollegiateGuidelinesNet-

work(SIGN),theUSDepartmentofHealthandHumanServicesChildWelfareInformation

Gateway,theUSSocialandBehavioralSciencesTeam,andothers.TheBritishgovernmenthas

establishedeightnew(well-financed)WhatWorksCenterssimilartotheNationalInstitutefor

HealthandCareExcellence(NICE),withmoreplanned.TheyextendNICE’sevaluationofhealth

treatmentintoaging,earlyintervention,education,crime,localeconomicgrowth,Scottishser-

vicedelivery,poverty,andwellbeing.Thesecentersseerandomizedcontrolledtrialsastheir

Page 4: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

3

preferredtool.Thereisawidespreaddesireforcarefulevaluation—tosupportwhatissome-

timescalledthe“auditsociety”—andeveryoneassentstotheideathatpolicyshouldbebased

onevidenceofeffectiveness,forwhichrandomizedtrialsappeartobeideallysuited.Trialsare

easily,ifnotveryprecisely,explainedalongthelinesthatrandomselectiongeneratestwooth-

erwiseidenticalgroups,onetreatedandonenot;resultsareeasytocompute—allweneedis

thecomparisonoftwoaverages;andunlikeothermethods,itseemstorequirenospecialized

understandingofthesubjectmatter.Itseemsatrulygeneraltoolthat(nominally)worksinthe

samewayinagriculture,medicine,sociology,economics,politics,andeducation.Itissupposed

torequirenopriorknowledge,whethersuspectornot,whichisseenasagreatadvantage.

Inthispaper,wepresenttwosetsofarguments,oneonconductingRCTSandonhowto

interprettheresults,andoneonhowtousetheresultsoncewehavethem.Althoughwedonot

carefortheterms—forreasonsthatwillbecomeapparent—thetwosectionscorrespondrough-

lytointernalandexternalvalidity.

Randomizedcontrolledtrialsareoftenuseful,andhavebeenimportantsourcesofem-

piricalevidenceforcausalclaimsandevaluationofeffectivenessinmanyfields.Yetmanyofthe

popularinterpretations—notonlyamongthegeneralpublic,butalsoamongtrialists—arein-

completeandsometimesmisleading,andthesemisunderstandingscanleadtounwarranted

trustintheimpregnabilityofresultsfromRCTs,toalackofunderstandingoftheirlimitations,

andtomistakenclaimsabouthowwidelytheirresultscanbeused.Allthese,inturn,canleadto

flawedpolicyrecommendations.

Amongthemisunderstandingsarethefollowing:(a)randomizationensuresafairtrial

byensuringthat,atleastwithhighprobability,treatmentandcontrolgroupsdifferonlyinthe

treatment;(b)RCTsprovidenotonlyunbiasedestimatesofaveragetreatmenteffects,butalso

preciseestimates;(c)randomizationisnecessarytosolvetheselectionproblem;(d)lackof

blinding,whichiscommoninsocialscienceexperiments,doesnotseriouslycompromiseinfer-

ence;(e)statisticalinferenceinRCTs,whichrequiresonlythesimplecomparisonofmeans,is

straightforward,sothatstandardsignificancetestsarereliable.

WhilemanyoftheproblemsofRCTsaresharedwithobservationalstudies,someare

unique,forexamplethefactthatrandomizingitselfcanchangeoutcomesindependentlyof

treatment.Moregenerally,itisalmostneverthecasethatanRCTcanbejudgedsuperiortoa

well-conductedobservationalstudysimplybyvirtueofbeinganRCT.Theideathatallmethods

Page 5: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

4

havetheirflaws,butRCTsalwayshavefewest,isoneofthedeepestandmortperniciousmis-

understandings.

Inthesecondpartofthepaper,wediscusstheusesandlimitationsofresultsfromRCTs

formakingpolicy.Thenon-parametricandtheory-freenatureofRCTs,whichisarguablyanad-

vantageinestimation,isaseriousdisadvantagewhenwetrytousetheresultsoutsideofthe

contextinwhichtheywereobtained.Muchoftheliterature,ineconomicdevelopmentand

elsewhere,perhapsinspiredbyCampbellandStanley’s(1963)famous“primacyofinternalvalid-

ity,”assumesthatinternalvalidityisenoughtoguaranteetheusefulnessoftheestimatesindif-

ferentcontexts.WithoutunderstandingRCTswithinthecontextoftheknowledgethatweal-

readypossessabouttheworld,muchofitobtainedbyothermethods,wedonotknowhowto

usetrialresults.ButoncethecommitmenthasbeenmadetoseeingRCTswithinthisbroader

structureofknowledgeandinference,andwhentheyaredesignedtofitwithinit,theycanplay

ausefulroleinbuildinggeneralknowledgeandpolicypredictions;forexample,anRCTcanbea

goodwayofestimatingakeypolicymagnitude.ThebroadercontextwithinwhichRCTsneedto

besetincludesnotonlymodelsofeconomicstructure,butalsothepreviousexperiencethat

policymakershaveaccumulatedaboutlocalsettingsandimplementation.Mostimportantlyfor

economicdevelopment,theuseofRCTresultsshouldbesensitivetowhatpeoplewant,both

individuallyandcollectively.RCTsshouldnotbecomeyetanothertechnicalfixthatisimposed

onpeoplebybureaucratsorforeigners;RCTresultsneedtobeincorporatedintoademocratic

processofpublicreasoning,Sen(2011).Greenberg,Shroder,andOnstott(1999)documentthat,

evenbeforetherecentwaveofRCTsindevelopment,mostRCTsineconomicshavebeencar-

riedoutbyrichpeopleonpoorpeople,andthefactshouldmakeusespeciallysensitivetoavoid

chargesofpaternalism.

Section1:InterpretingtheresultsofRCTs

1.1Prolog

RCTswerefirstpopularizedbyFisher’sagriculturaltrialsinthe1930sandaretodayoftende-

scribedbytheRubincounterfactualcausalmodel,whichitselftracesbacktoNeymanin1923,

seeFreedman(2006)foradescriptionofthehistory:Eachuniti(aperson,apupil,aschool,an

agriculturalplot)isassumedtohavetwopossibleoutcomes, and ,theformeroccurring

ifthereisnotreatmentatthetimeinquestion,thelatteriftheunitistreated.Thedifference

betweenthetwooutcomes istheindividualtreatmenteffect,whichweshalldenote

Treatmenteffectsaretypicallydifferentfordifferentunits.Nounitcanbebothtreatedand

Yio Yi1

Yi1 −Yi0βi .

Page 6: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

5

untreatedatthesametime,soonlyoneorotheroftheoutcomesoccurs;theotheriscounter-

factualsothatindividualtreatmenteffectsareinprincipleunobservable.

Wenoteparentheticallythatwhileweusethecounterfactualframeworkhere,wedo

notendorseit,norargueagainstotherapproachesthatdonotuseit,suchastheCowlescom-

missioneconometricframeworkwherethecausalrelationsarecodedasstructuralequations,

seealsoPearl(2009.)ImbensandWooldridge(2009,Introduction)provideaneloquentdefense

oftheRubinformulation,emphasizingthecredibilitythatcomesfromatheory-freespecifica-

tionwithunlimitedheterogeneityintreatmenteffects.HeckmanandVytlacil(2007,Introduc-

tion)makeanequallyeloquentcaseagainst,notingthatthetreatmentsinRCTsareoftenun-

clearlyspecifiedandthatthetreatmenteffectsarehardtolinktoinvariantparametersthat

wouldbeusefulelsewhere.

ThebasictheoremgoverningRCTsisaremarkableone.Itstatesthattheaveragetreat-

menteffectistheaverageoutcomeinthetreatmentgroupminustheaverageoutcomeinthe

controlgroup.Whilewecannotobservetheindividualtreatmenteffects,wecanobservetheir

mean.Theestimateoftheaveragetreatmenteffect(ATE)issimplythedifferencebetweenthe

meansinthetwogroups,andithasastandarderrorthatcanbeestimatedandusedtomake

significancestatementsaccordingtothestatisticaltheorythatappliestothedifferenceoftwo

means,onwhichmorebelowinSection1.3.Thedifferenceinmeansisanunbiasedestimatorof

themeantreatmenteffect.

Thetheoremisremarkablebecauseitrequiressofewassumptions;nomodelisre-

quired,noassumptionsaboutcovariatesareneeded,thetreatmenteffectscanbeheterogene-

ous,andnothingisassumedabouttheshapesofstatisticaldistributionsotherthanthestatisti-

calquestionoftheexistenceofthemeanofthecounterfactualoutcomevalues.Intermsofone

ofourrunningthemes,itrequiresnoexpertknowledge,ornoacceptanceofpriors,expertor

otherwise.Thetheoremalsohasitslimitations;theproofusesthefactthatthedifferencein

twomeansisthemeanoftheindividualdifferences,i.e.thetreatmenteffects.Thisisnottrue

forthemedian(thedifferenceintwomediansisnotthemedianofthedifferenceswhichisthe

mediantreatmenteffect).Italsodoesnotallowustoestimateanypercentileofthedistribution

oftreatmenteffects,oritsvariance.(Quantileestimatesoftreatmenteffectsarenotthequan-

tilesofthedistributionoftreatmenteffects,butthedifferencesinthequantilesofthetwomar-

ginaldistributionsoftreatmentsandcontrols;thetwomeasurescoincideiftheexperimenthas

noeffectonranks,anassumptionthatwouldbeconvenientbutishardtojustify,atleastin

Page 7: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

6

general.)AllofthesestatisticscanbeofinterestforpolicybutRCTsarenotinformativeabout

them,oratleastnotwithoutfurtherassumptions,forexampleonthedistributionoftreatment

effects,seeHeckman,Smith,andClements(1997),andmuchoftheattractionofRCTsisthe

absenceofsuchassumptions.

Thebasictheoremtellsusthatthedifferenceinmeansisanunbiasedestimatorofthe

averagetreatmenteffectbutsaysnothingaboutthevarianceofthisestimator.Ingeneral,abi-

asedestimatorthatistypicallyclosertothetruthwilloftenbebetterthananunbiasedestima-

torthatistypicallywideofthetruth.Thereisnothingtosaythatanon-RCTestimator,inspite

ofbias,mightnothavealowermeansquarederror(MSE),onemeasureofthedistanceofthe

estimatefromthetruth,oralowervalueofa“lossfunction”thatdefinesthelosstotheexper-

imenterofmissingthetarget.

ItisusefultothinkofthemeanaveragetreatmenteffectfromanRCTintermsofsam-

plingfromafinitepopulation,aswhentheBureauoftheCensusestimatesaverageincomeof

theUSpopulationin2013.FortheRCT,thepopulationisthepopulationofunitswhoseaverage

treatmenteffectisofinterest;notetheimportanceofdefiningthepopulationofinterestbe-

cause,giventheheterogeneityoftreatmenteffects,theaveragetreatmenteffectwillvary

acrossdifferentpopulations,justasaverageincomesdifferacrossdifferentsubpopulationsof

theUS.Finitepopulationsamplingtheorytellsushowtogetaccurateestimatesofmeansfrom

samples;intheRCTcase,thesampleisthestudysample,bothtreatmentsandcontrols.Inprin-

ciple,thestudysamplecouldbearandomsampleoftheparentpopulationofinterest,inwhich

caseitisrepresentativeofit,butthatisseldomthecase.Becausetheestimateispopulation

specific,itisnot(orneednotbe)thoughtofastheparameterofasuper-population,orother-

wisegeneralizableinanyway.AverageincomeintheUSin2013maybeofinterestinitsown

right;butitwillnotbethesameasaverageincomein2014,norwillitbethesameasaverage

incomeofwhites,orofthepopulationsofWyomingorNewYork.Exactlythesameistrueof

theestimateofanaveragetreatmenteffect;itappliestothestudysampleinwhichthetrialwas

done,atthetimewhenitwasdone,anditsuseoutsideofthoseconfines,thoughoftenpossi-

ble,requiresargumentandjustification.Withoutsuchanargument,wecannotclaimthatan

ATEis“the”meantreatmenteffectanymorethanthataverageincomeintheUSin2013is

“the”averageincomeoftheUSinanyotheryear.Ofcourse,knowingaverageincomein2013

canbeusefulformakingothercalculations,suchasanestimateofincomein2014,orofasub-

Page 8: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

7

populationthatweknowisricherorpoorer;thefactthatanestimatedoesnotuniversallygen-

eralizedoesnotmakeituseless.WeshallreturntotheseissuesinSection2.

1.2.Precision,balance,andrandomization

1.2.1Precisionandbias

Weshouldlikeourestimateoftheaveragetreatmenteffecttobeasclosetothetruthaspossi-

ble.Onewaytoassessclosenessisthemeansquareerror(MSE),definedas

(1)

where isthetrueaveragetreatmenteffect,and isitsestimatefromaparticulartrial.The

expectationistakenoverrepeatedrandomizationsoftreatmentsandcontrolsusingthesame

studypopulation.Itisalsostandardtorewrite(1)as

(2)

sothatmeansquareerroristhesumofthevarianceoftheestimator—whichwetypicallyknow

somethingaboutfromtheestimatedstandarderror—andthesquareofthebias—whichinthe

caseofa(nideal)randomizedcontrolledtrialiszero.Theelementary,butcrucialpointisthat,

whileitiscertainlygoodthatthebiasiszero,thatfactdoesnothingtomakethedistancefrom

thetruthassmallasitmightbe,whichiswhatwereallycareabout.Anunbiasedestimatorthat

isnearlyalwayswideofthetargetisnotasusefulasonethatisalwaysneartoit,evenif,on

average,itisoffcenter.Moregenerally,itwilloftenbedesirabletotradeinsomeunbiasedness

forgreaterprecision.Experimentsareoftenexpensive,sowecannotalwaysrelyonlargesam-

plestobringtheestimateclosetothetruthandresolvetheseissuesforus.MuchofthisSection

isconcernedwithhowtodesignexperimentstomaximizeprecision.

Unbiasednessalonecannotthereforejustifytheoften-expressedpreferenceforRCTs

overotherestimators.TheminimalistassumptionsrequiredforanRCTtobeunbiasedarean

attractionalthough,asweshallseeinthisSection,thisadvantageusuallycomesatthecostof

loweredprecisionandofdifficultiesinknowinghowtousetheresult,asweshallseeinSection

2.YetthereisanoftenexpressedbeliefthatRCTsaresomehowguaranteedtobeprecise,simp-

lybecausetheyareRCTs.Occasionallybiasandprecisionareexplicitlyconfused;theJPALweb-

site,initsexplanationofwhyitisgoodtorandomize,saysthatRCTs“aregenerallyconsidered

themostrigorousand,allelseequal,producethemostaccurate(i.e.unbiased)results.”Shad-

ish,Cook,andCampbell(2002,p.276),inwhatis(rightly)consideredoneofthebiblesofcausal

inferenceinsocialscience,statewithoutqualificationthat“randomizedexperimentsprovidea

MSE = E(⌢θ −θ )2

θ ⌢θ

MSE = E (

⌢θ − E(

⌢θ )( )2 + E(

⌢θ )−θ( )2 = var( ⌢θ )+ bias( ⌢θ ,θ )2

Page 9: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

8

preciseansweraboutwhetheratreatmentworked”(p.276)and“Therandomizedexperimentis

oftenthepreferredmethodforobtainingapreciseandstatisticallyunbiasedestimateofthe

effectsofanintervention,”(p.277)ouritalics.

ContrastthiswithCronbachetal(1980)whoquotesKendall’s(1957)pasticheofLong-

fellow,“Hiawathadesignsanexperiment,”whereHiawatha’sinsistenceonunbiasednessleads

tohisneverhittingthetargetandtohiseventualbanishment.

1.2.2Balanceandprecisioninalinearall-causemodel

AusefulwaytothinkaboutprecisionandwhatanRCTdoesanddoesnotdoistouseasche-

maticlinearcausalmodeloftheform:

(3)

where,asbefore, istheoutcomeforuniti, isadichotomous(1,0)treatmentdummyin-

dicatingwhetherornotiistreated,and istheindividualtreatmenteffectofthetreatment

oni.Thex’saretheobservedorunobservedothercausesoftheoutcome,andwesupposethat

(3)capturesallthecausesof Yi . Jmaybeverylarge.Becausetheheterogeneityoftheindividu-

altreatmenteffects βi isunrestricted,weallowthepossibilitythatthetreatmentinteractswith

thex’sorothervariables,sothattheeffectsofTcandependonanyothervariables,andwe

shallhaveoccasiontomakethisexplicitbelow.Anobviousandimportantexampleiswhenthe

treatmentifeffectiveonlyinthepresenceofaparticularvalueofoneofthex’s.

Wedonotneedisubscriptsonthe γ 's thatcontroltheeffectsoftheothercauses;if

theireffectsdifferacrossindividuals,weincludetheinteractionsofindividualcharacteristics

withtheoriginalx’sasnewx’s.Giventhatthex’scanbeunobservable,thisisnotrestrictive.

Becausethe β 's candependonthex’s,theeffectsofthex’sontheoutcomecandependon

Ti , or,equivalently,theeffectsoftreatmentcandependoncovariates.

Inanexperiment,withorwithoutrandomization,wecanrepresentthetreatmentgroup

ashaving andthecontrolgroupashaving Sowhenwesubtracttheaverageout-

comesamongthecontrolsfromtheaverageoutcomesamongthetreatments,wewillget

Y

1−Y

0= β

1+ γ j (xij

1−

j=1

J

∑ xij0) = β

1+ (S

1− S

0) (4)

Thefirsttermonthefarrighthandside,whichistheaveragetreatmenteffect,iswhatwewant,

butthesecondtermorerrorterm,whichisthesumofthenetaveragebalancesofothercauses

Yi = βiTi + γ j xijj=1

J∑Yi Ti

βi

Ti = 1, Ti = 0.

Page 10: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

9

acrossthetwogroups,willgenerallybenon-zero—becauseofselectionormanyotherrea-

sons—andneedstobedealtwithsomehow.Wegetwhatwewantwhenthemeansofallthe

othercausesareidenticalinthetwogroups,ormorepreciselywhenthesumoftheirnetdiffer-

ences S1− S

0iszero;thisisthecaseofperfectbalance.Withperfectbalance,thedifference

betweenthetwomeansisexactlyequaltotheaverageofthetreatmenteffectamongthe

treated,sothatwehavetheultimateprecisionandweknowtheanswerexactly,atleastinthis

linearcase.

1.2.3Balancingacts:realandmagical

Howdowegetbalance,orsomethingclosetoit?What,exactly,istheroleofrandomization?In

alaboratoryexperiment,wherethereisgoodbackgroundknowledgeoftheothercauses,the

experimenterhasagoodchanceofcontrollingalloftheothercauses,aimingtoensurethatthe

lasttermin(4)isclosetozero.Failingsuchknowledgeandcontrol,analternativeismatching,

frequentlyusedinstatistical,medical,andeconometricwork.Foreachtreatment,amatchis

foundthatisascloseaspossibleonallsuspectedcauses,sothat,onceagain,thelasttermin(4)

canbekeptsmall.Again,whenwehaveagoodideaofthecauses,matchingmayalsodelivera

preciseestimate.Ofcourse,whenthereareimportantunknownorunobservablecauses,nei-

therlaboratorycontrolnormatchingoffersprotection.

Whatdoesrandomizationdo?Becausethetreatmentsandcontrolscomefromthe

sameunderlyingdistribution,randomizationguarantees,byconstruction,thatthelasttermon

therightin(4)iszeroinexpectationatbaseline(muchcanhappentodisturbthisbeyondbase-

line).Thisistruewhetherornotthecausesareobserved.IftheRCTisrepeatedmanytimeson

thesametrialpopulation,thenthelasttermwillbezerowhenaveragedoveraninfinitenumber

of(entirelyhypothetical)trials.Ofcourse,thisdoesnothingtomakeitzeroinanyonetrial

wherethedifferenceinmeanswillbeequaltotheaveragetreatmenteffectamongthosetreat-

edplusatermthatreflectstheimbalanceintheneteffectsoftheothercauses.Wedonot

knowthesizeofthiserrorterm,andthereisnothingintherandomizationthatlimitsitssize;by

chance,therecanbeone(ormore)importantexcludedcause(s)thatisveryunequallydistribut-

edbetweentreatmentandcontrols.Thisimbalancewillvaryoverreplicationsofthetrial,and

itsaveragesizewillideallybecapturedbythestandarderroroftheestimatedATE,whichgives

ussomeideaofhowlikelywearetobeawayfromthetruth.Gettingthestandarderrorand

associatedsignificancestatementsrightarethereforeofgreatimportance.

Page 11: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

10

Exactlywhatrandomizationdoesisfrequentlylostinthepracticalliterature,andthere

isoftenaconfusionbetweenperfectcontrol,ontheonehand—asinalaboratoryexperimentor

perfectmatchingwithnounobservablecauses—andcontrolinexpectation—whichiswhatRCTs

do.WesuspectthatatleastsomeofthepopularandprofessionalenthusiasmforRCTs,aswell

asthebeliefthattheyareprecisebyconstruction,comesfrommisunderstandingsaboutbal-

ance.Thesemisunderstandingsarenotsomuchamongthetrialistswho,whenpressed,willgive

acorrectaccount,butcomefromimprecisestatementsbytrialiststhataretakenasgospelby

thelayaudiencethatthetrialistsarekeentoreach.

SuchamisunderstandingiswellcapturedbythefollowingquotefromtheWorldBank’s

onlinemanualonimpactevaluation:

“Wecanbeveryconfidentthatourestimatedaverageimpact,givenasthedifference

betweentheoutcomeundertreatment(themeanoutcomeoftherandomlyassigned

treatmentgroup)andourestimateofthecounterfactual(themeanoutcomeofthe

randomlyassignedcomparisongroup)constitutethetrueimpactoftheprogram,since

byconstructionwehaveeliminatedallobservedandunobservedfactorsthatmightoth-

erwiseplausiblyexplainthedifferenceinoutcomes.”Gertleretal(2011)(ouritalics.)

Thisstatementconfusesactualbalanceinanysingletrialwithbalanceinexpectationovermany

entirelyhypotheticaltrials.Ifthestatementaboveweretrue,andifallfactorswereindeedcon-

trolled(andnoimbalanceswereintroducedpostrandomization),thedifferencewouldbean

exactmeasureoftheaveragetreatmenteffect,atleastintheabsenceofmeasurementerror.

Weshouldnotonlybeconfidentofourestimate;wewouldknowthetruth,asthequotesays.

AsimilarquotecomesfromJohnList,oneofthemostimaginativeandsuccessfulschol-

arswhouseRCTs:

“complicationsthataredifficulttounderstandandcontrolrepresentkeyreasonsto

conductexperiments,notapointofskepticism.Thisisbecauserandomizationactsasan

instrumentalvariable,balancingunobservablesacrosscontrolandtreatmentgroups.”

Al-UbaydliandList(2013)(italicsintheoriginal.)

AndfromDeanKarlan,founderandPresidentofYale’sInnovationsforPovertyAction,which

runsdevelopmentRCTsaroundtheworld:

“Asinmedicaltrials,weisolatetheimpactofaninterventionbyrandomlyassigningsub-

jectstotreatmentsandcontrolgroups.Thismakesitsothatallthoseotherfactors

whichcouldinfluencetheoutcomearepresentintreatmentandcontrol,andthusany

Page 12: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

11

differenceinoutcomecanbeconfidentlyattributedtotheintervention.”Karlan,Gold-

bergandCopestake(2009)

Andfromthemedicalliterature,fromadistinguishedpsychiatristwhoisdeeplyskepticalof

RCTs,

“Thebeautyofarandomizedtrialisthattheresearcherdoesnotneedtounderstandall

thefactorsthatinfluenceoutcomes.Saythatanundiscoveredgeneticvariationmakes

certainpeopleunresponsivetomedication.Therandomizingprocesswillensure—or

makeithighlyprobable—thatthearmsofthetrialcontainequalnumbersofsubjects

withthatvariation.Theresultwillbeafairtest.”(Kramer,2016,p.18)

ClaimsareevenmadethatRCTsrevealknowledgewithoutpossibilityoferror.JudyGueron,the

long-timepresidentofMDRC,whichhasbeenrunningRCTsonUSgovernmentpolicyfor45

years,askswhyfederalandstateofficialswerepreparedtosupportrandomizationinspiteof

frequentdifficultiesandinspiteoftheavailabilityofothermethods,andconcludesthatitwas

because“theywantedtolearnthetruth,”GueronandRolston(2013,429).Therearemany

statementsoftheform“Weknowthat[projectX]workedbecauseitwasevaluatedwitharan-

domizedtrial,”Dynarski(2015).

Manywritersaremorecautious,andmodifystatementsabouttreatmentandcontrol

groupsbeingidenticalwithtermssuchas“statisticallyidentical,”“reasonablysimilar”ordonot

differ“systematically.”Andwehavenodoubtthatalloftheauthorsquotedaboveunderstand

theneedforthesequalifications.Buttotheuninformedreader,thequalifiedstatementsare

unlikelytobedifferentiatedfromtheunqualifiedstatementsquotedabove.Norisitalways

clearwhatsomeofthesetermsmean.Forexample,iftwopeopleareselectedatrandomfroma

population,anditsohappensthatoneisfemaleandonemale,inwhatsensetheyarestatisti-

callyidentical?Whileitistruethattheywererandomlyselectedfromthesameparentdistribu-

tion,whichprovidesthebasisforinference,thecalculationofstandarderrors,andsignificance

statements,itdoesnothingtohelpwithbalanceorprecisioninanygiventrial.

1.2.4Samplesizeandstatisticalinferenceinunbalancedtrials

Isasingletrialmorelikelytobebalanced,andthusmoreprecise,whenthesamplesizeislarge?

Indeed,asthesamplesizetendstoinfinity,themeansofthex’sinthetreatmentandcontrol

groupswillbecomearbitrarilyclose.YetthisisoflittlehelpinfinitesamplesasFisher(1926)

noted:“Mostexperimentersoncarryingoutarandomassignmentwillbeshockedtofindhow

farfromequallytheplotsdistributethemselves,”quotedinMorganandRubin(2012).Evenwith

Page 13: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

12

verylargesamplesizes,iftherearealargenumberofcauses,balanceoneachcausemaybe

infeasible.Vandenbroucke(2004)notesthattherearethreemillionbasepairsinthehuman

genome,manyorallofwhichcouldberelevantprognosticfactorsforthebiologicaloutcome

thatweareseekingtoinfluence.

However,as(4)makesclear,wedonotneedbalanceonallcauses,onlyontheirnetef-

fect,theterm S 1 − S 0 whichdoesnotrequirebalanceoneachcauseindividually.Yetthereis

noguaranteethateventheneteffectwillbesmall.Forexample,theremayonlybeoneomitted

unobservedcausewhoseeffectislarge,onesinglebasepairsay,sothatifthatonecauseisun-

balancedacrosstreatmentsandcontrols,thatthereisindividualorevennetbalanceonother

lessimportantcausesisnotgoingtohelp.

Statementsaboutlargesamplesguaranteeingbalancearenotusefulwithoutguidelines

abouthowlargeislargeenough,andsuchstatementscannotbemadewithoutknowledgeof

othercausesandhowtheyaffectoutcomes.

Asimplecaseillustrates.Supposethatthereisonehiddencausein(3),abinaryvariable

xthatisunitywithprobabilitypand0otherwise.Withncontrolsandntreatments,thediffer-

enceinfractionswithx=1inthetwogroupshasmean0andvariance 1/ np(1− p). Withn=100

andp=0.5,thestandarderroraround0is0.2sothat,ifthisunobservedconfounderhasalarge

effectontheoutcome,theimbalancecouldeasilymasktheeffectoftreatment,orbemistaken

asevidencefortheeffectivenessofatrulyineffectivetreatment.

Lackofbalanceintheaboveexampleorintheneteffectofeitherobservablesornon-

observablesin(4)doesnotcompromisetheinferenceinanRCTinthesenseofobtaininga

standarderrorfortheunbiasedATE,seeSenn(2013)foraparticularlyclearstatement.The

randomizationdoesnotguaranteebalancebutitprovidesthebasisformakingprobability

statementsaboutthevariouspossibleoutcomes,whichisalsoclearintheexampleintheprevi-

ousparagraph.ThiswasalsoFisher’sargumentforrandomization.Sennwrites“theprobability

calculationappliedtoaclinicaltrialautomaticallymakesanallowanceforthefactthatthe

groupswillalmostcertainlybeunbalanced.”(italicsintheoriginal.)Ifthedesignissuchthat,

evenwithperfectrandomization,successivereplicationstendtogeneratelargeimbalances,the

resultingimprecisionoftheATEwillshowupinitsstandarderror.Ofcourse,theusefulnessof

thisrequiresthatthecalculatedstandarderrorspermitcorrectsignificancestatements,which,

asweshallseeinthenextsubsection,isoftenfarfromstraightforward.Intheexampleabove,

anextreme,butentirelypossible,caseoccurswhen,bychance,theunobservedconfounderis

Page 14: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

13

perfectlycorrelatedwiththetreatment;unlessthereareactualreplications,thefalsecertainty

thatsuchanexperimentprovideswillbereinforcedbyfalsesignificancetests.

1.2.4Testingforbalance

Inpractice,trialistsineconomics(andinsomeotherdisciplines)usuallycarryoutastatistical

testforbalanceafterrandomizationbutbeforeanalysis,presumablywiththeaimoftaking

someappropriateactionifbalancefails.Thefirsttableofthepapertypicallypresentsthesam-

plemeansofobservablecovariates—theobservablex’sin(3),whichareeithercausesintheir

ownrightorinteractwiththe β 's—forthecontrolandtreatmentgroups,togetherwiththeir

differences,andtestsforwhetherornottheyaresignificantlydifferentfromzero,eithervaria-

blebyvariable,orjointly.Thesetestsareappropriateifweareconcernedthattherandom

numbergeneratormighthavefailed(becausewearedrawingplayingcards,rollingdice,or

spinningbottletops,thoughpresumablynotiftherandomizationisdonebyarandomnumber

generator,alwayssupposingthatthereissuchathingasrandomness,SingerandPincus(1998)),

orifweareworriedthattherandomizationisunderminedbynon-blindedsubjectsortrialists

systematicallyunderminingtheallocation.Otherwise,asthenextparagraphshows,thetest

makesnosenseandisnotinformative,whichdoesnotseemtostopitbeingroutinelyused.

Ifwewrite µ0 and µ1 forthe(vectorsof)populationmeans(i.e.themeansoverall

possiblerandomizations)oftheobservedx’sinthecontrolandtreatmentgroupsatthepointof

assignment,thenullhypothesisis(presumably,asjudgedbythetypicalbalancetest)thatthe

twovectorsareidentical,withthealternativebeingthattheyarenot.Butiftherandomization

hasbeencorrectlydone,thenullhypothesisistruebyconstruction,seee.g.Altman(1985)and

Senn(1994),whichmayhelpexplainwhyitsorarelyfailsinpractice.Indeed,althoughwecan-

not“test”it,weknowthatthenullhypothesisisalsotruefortheunobservablecomponentsof

x.NotethecontrastwiththestatementsquotedaboveclaimingthatRCTsguaranteebalanceon

causesacrosstreatmentandcontrolgroups.Thosestatementsrefertobalanceofcausesatthe

pointofassignmentinanysingletrial,whichisnotguaranteedbyrandomization,whereasthe

balancetestsareaboutthebalanceofcausesatthepointofassignmentinexpectationover

manytrials,whichisguaranteedbyrandomization.Theconfusionisperhapsunderstandable,

butitisconfusionnevertheless.Ofcourse,itmakessensetolookforbalancebetweenobserved

covariatesusingsomemoreappropriatedistancemeasureforexamplethenormalizeddiffer-

enceinmeans,ImbensandWooldridge(2009,equation3).

Page 15: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

14

1.2.5Methodsforbalancing

Oneproceduretoimprovebalanceistoadaptthedesignbeforerandomization,forexampleby

stratification.Fisher,whoasthequoteaboveillustrates,waswellawareofthelossofprecision

fromrandomizationarguedfor“blocking”(stratification)inagriculturaltrialsorforusingLatin

Squares,bothofwhichrestricttheamountofimbalance.Stratification,tobeuseful,requires

somepriorunderstandingofthefactorsthatarelikelytobeimportant,andsoittakesusaway

fromthe“noknowledgerequired,”or“nopriorsaccepted”appealofRCTs.ButasScriven(1974,

103)notes:“causehunting,likelionhunting,isonlylikelytobesuccessfulifwehaveaconsider-

ableamountofrelevantbackgroundknowledge,”orevenmorestrongly,“nocausesin,no

causesout,”Cartwright(1994,Chapter2).StratificationinRCTs,asinotherformsofsampling,is

astandardmethodforusingbackgroundknowledgetoincreasetheprecisionofanestimator.It

hasthefurtheradvantagethatitallowsfortheexplorationofdifferentaveragetreatmentef-

fectsindifferentstratawhichcanbeusefulinadaptingortransportingtheresultstootherloca-

tions,seeSection2.

Stratificationisnotpossiblewhentherearetoomanycovariates,orifeachhasmany

values,sothattherearemorecellsthancanbefilledgiventhesamplesize.Analternativeisto

re-randomize,repeatingtherandomizationuntilthedistancebetweentheobservedcovariates

islessthansomepredeterminedcriteria.MorganandRubin(2012)suggesttheMahalanobisD–

statistic,anduseFisher’srandomizationinference(tobediscussedfurtherbelow)tocalculate

standarderrorsthattakethere-randomizationintoaccount.Analternative,widelyadaptedin

practice,istoadjustforcovariatesbyrunningaregression(orcovariance)analysis,withthe

outcomeonthelefthandsideandthetreatmentdummyandthecovariatesasexplanatoryvar-

iables,includingpossibleinteractionsbetweencovariatesandtreatmentdummies.

Freedman(2008)hasanalyzedthismethodandargues“ifadjustmentmadeasubstan-

tialdifference,wewouldsuggestmuchcautionwheninterpretingtheresults.”Butasubstantial

differenceisexactlywhatwewouldliketosee,atleastsomeofthetime,iftheadjustment

movestheestimateclosertothetruth.FreedmanshowsthattheadjustedestimateoftheATE

isbiasedinfinitesamples,withthebiasdependingonthecorrelationbetweenthesquared

treatmenteffectandthecovariates.Thereisalsonogeneralguaranteethattheregressionad-

justmentwillgenerateamorepreciseestimate,althoughitwilldosoifthereareequalnumbers

oftreatmentsandcontrolsorifthetreatmenteffectsareconstantoverunits(inwhichcase

therewillalsobenobias).Evenwithbias,theregressionadjustmentisattractiveifitdoesin-

Page 16: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

15

deedtradeoffbiasforprecision,thoughpresumablynottoRCTpuristsforwhomunbiasedness

isthesinequanon.Noteagainthattheincreasedprecision,whenitexists,comesfromusing

priorknowledgeaboutthevariablesthatarelikelytobeimportantfortheoutcome.Thatthe

backgroundknowledgeortheoryiswidelysharedandunderstoodwillalsoprovidesomepro-

tectionagainstdataminingbysearchingthroughcovariatesinthesearchfor(perhapsfalsely)

estimatedprecision.

1.2.6Shouldwerandomize?

ThetensionbetweenrandomizationandprecisiongoesbacktotheearlydebatebetweenFisher

andStudent(Gosset)whoneveracceptedFisher’sargumentsforrandomization,seealsoZiliak

(2014).InhisdebatewithFisheraboutagriculturaltrials,Studentarguedthatrandomization

ignoredrelevantpriorinformation,forexampleabouthowlikelyconfounderswouldbedistrib-

utedacrossthetestplots,sothatrandomizationwastedresourcesandledtounnecessarily

poorestimates.Thisgeneralquestionofwhetherrandomizationisdesirablehasbeenreopened

inrecentpapersbyKasy(2016),Banerjee,Chassang,andSnowberg(2016)andBanerjee,

Chassang,Montero,andSnowberg(2016).

ReferbacktotheMSEintroducedabove,andconsiderdesigninganexperimentthatwill

makethisassmallaspossible.Unfortunately,thisisnotgenerallypossible;forexample,the“es-

timator”of3,say,fortheATEhasthelowestpossiblemean-squarederrorifthetrueATEisac-

tually3.Instead,weneedtoaveragetheMSEoveradistributionofpossibleATEs.Thisleadsto

adecisiontheoryapproachtoestimationwherebyaBayesianeconometricianwillestimatethe

ATEbychoosingtheallocationoftreatmentandcontrolssoastominimizetheexpectedvalue

ofalossfunction—theMSEbeingoneexample.Suchanapproachrequiresustospecifyaprior

ontheATE,ormoregenerally,ontheexpectationofoutcomesconditionalonthecovariates.

Thesepriorsareformalversionsoftheissuethathasalreadycomeuprepeatedly,thattoget

goodestimators,weneedtoknowsomethingabouthowthecovariatesaffecttheoutcome.

Kasy(2016)solvesthisproblemforthecaseofexpectedMSEandshowsthatrandomizationis

undesirable;itsimplyaddsnoiseandmakestheMSElarger.Heusesanon-parametricpriorthat

hasprovedusefulinanumberofotherapplications—wecouldpresumablydoevenbetterifwe

werepreparedtocommitfurther,andheprovidescodetoimplementhismethod,whichshows

a20percentreductioninMSEcomparedwithrandomization(14percentforstratifiedrandomi-

zation)forthewell-knownTennesseeSTARclass-sizeexperiment.

Page 17: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

16

Banerjeeetalproposeamoregenerallossfunctionandprovethecomparabletheorem,

thatrandomizationleadstolargerlossesthantheoptimalnon-randompurposiveassignment.

Theseauthorsrecommendrandomizationonothergrounds,whichwewilldiscussbelow,but

agreethat,forstandardstatisticalefficiencyormaximizationofexpectedutilityrandomization

shouldnotbeusedinexperimentaldesign.Studentwasright.

Severalpointsshouldbenoted.First,theanti-randomizationtheoremisnotajustifica-

tionofanynon-experimentaldesign,forexampleonethatcomparesoutcomesofthosewhodo

ordonotself-selectintotreatment.Selectioneffectsarerealenough,andifselectionisbased

onunobservablecauses,comparisonoftreatedandcontrolswillbebiased.Oneacceptablenon-

randomschemeistousetheobservablecovariatestodividethestudysampleintocellswithin

whichallobservationshavethesamevalueandthendivideeachcellintotreatmentsandcon-

trols.Withineachcell,orforthoseunitsonwhichwehavenoinformation,wecanchooseany

waywelike,includingrandomly,thoughrandomizationhasnoadvantageordisadvantage.Such

allocationsruleoutself-selection(ordoctororprogramadministratorselection)wheretheindi-

vidual(doctor,oradministrator)hasinformationnotvisibletothepersonassigningtreatments

andcontrols.Thekeyisthatthepersonwhomakestheassignment(theanalyst)usesallofthe

informationthatheorshepossesses,andthatoncethishasbeentakenintoaccount,allunits

areinterchangeableconditionalonthatinformation,sothatassignmentbeyondthatdoesnot

matter.Ofcourse,theprogramadministratorsmustenforcetheanalyst’sassignment,sothat

privateinformationthattheyortheunitspossessisnotallowedtoaffecttheassignment,condi-

tionalontheinformationusedbytheanalyst.Giventhis,selectiononunobservablesisruled

out,anddoesnotaffecttheresults.Randomizationisnotrequiredtoeliminateselectionbias.

Whetheritisreallypossiblefortheanalysttoassignarbitrarilyisanopenquestion,asis

whether“randomization”fromarandom-numbergeneratorwilldoso.Evenmachine-generated

sequenceshavecauses,andeveniftheanalysthasonlyasetofuninformativelabelsforthe

units,thosetoomustcomefromsomewhere,sothatitispossiblethatthosecausesarelinked

totheunobservedcausesintheexperiment.Wedonotattempttodealherewiththesedeep

issuesonthemeaningofrandomization,butseeSingerandPincus(1998).

AccordingtoChalmers(2001)andBothwellandPodolsky(2016),thedevelopmentof

randomizationinmedicineoriginatedwithBradford-HillwhousedrandomizationinthefirstRCT

inmedicine—thestreptomycintrial—becauseitpreventeddoctorsselectingpatientsonthe

basisofperceivedneed(oragainstperceivedneed,leaningoverbackwardasitwere),anargu-

Page 18: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

17

mentmorerecentlyechoedbyWorrall(2007).Randomizationservesthispurpose,butsodo

othernon-discretionaryschemes;whatisrequiredisthatthehiddeninformationnotaffectthe

allocation.Whileitistruethatdoctorscannotbeallowedtomaketheassignment,itisnottrue

thatrandomizationistheonlyschemethatcanbeenforced.

Second,theidealrulesbywhichunitsareallocatedtotreatmentorcontroldependon

thecovariates,andontheinvestigators’priorsabouthowthecovariatesaffecttheoutcomes.

Thisopensupallsortsofmethodsofinferencethatareexcludedbypurerandomization.For

example,thehypothetico-deductivemethodworksbyusingtheorytomakeapredictionthat

canbetakentothedata;herethepredictionswouldbeoftheformthataunitwithcharacteris-

ticsxwillrespondinaparticularwaytotreatment,falsificationofwhichcanbetestedbyan

appropriateallocationofunitstotreatment.Banerjee,ChassangandSnowberg(2016)provide

suchexamples.

Third,randomization,byrunningroughshodoverpriorinformationfromtheoryand

fromthecovariates,iswastefulandevenunethicalwhenitunnecessarilyexposespeople,or

unnecessarilymanypeople,topossibleharminariskyexperiment,seeWorrall(2002)foran

egregiouscaseofhowanunthinkingdemandforrandomizationandtherefusaltoacceptprior

informationputchildren’slivesdirectlyatrisk.

Fourth,thenon-randommethodsusepriorinformation,whichiswhytheydobetter

thanrandomization.Thisisbothanadvantageandadisadvantage,dependingonone’sperspec-

tive.Ifpriorinformationisnotwidelyaccepted,orisseenasnon-crediblebythoseweareseek-

ingtopersuade,wewillgeneratemorecredibleestimatesifwedonotusethosepriors.Indeed,

thisiswhyBanerjee,ChassangandSnowberg(2016)recommendrandomizeddesigns,including

inmedicineandindevelopmenteconomics.Theydevelopatheoryofaninvestigatorwhoisfac-

inganadversarialaudiencethatwillchallengeanypriorinformationandcanevenpotentially

vetoresultsthatarebasedonit(thinkadministrativeagenciesorjournalreferees).Theexperi-

mentertradesoffhisorherowndesireforprecision(andpreventingpossibleharmtosubjects),

whichusespriorinformation,againstthewishesoftheaudience,whowantnothingofthepri-

ors.Eventhen,theapprovalofthisaudienceisonlyexante;oncethefullyrandomizedexperi-

menthasbeendone,nothingstopscriticsarguingthat,infact,therandomizationdidnotoffera

fairtest.AmongdoctorswhouseRCTs,andespeciallymeta-analysis,suchargumentsare(ap-

propriately)common;seeagainKramer(2016).

Page 19: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

18

AswenotedintheIntroduction,muchofthepublichascometoquestionexpertprior

knowledge,andBanerjee,Chassang,MonteroandSnowberg(2016)haveprovidedanelegant

(positive)accountofwhyRCTswillflourishinsuchanenvironment.Incaseswherethereisgood

reasontodoubtthegoodfaithofexperimenters,asinsomepharmaceuticaltrials,randomiza-

tionwillindeedbetheappropriateresponse.Butwebelievesuchargumentsaredeeplyde-

structiveforscientificendeavorandshouldberesistedasageneralprescriptionforscientific

research.Economistsandothersocialscientistsknowagreatdeal,andtherearemanyareasof

theoryandpriorknowledgethatarejointlyendorsedbylargenumbersofknowledgeablere-

searchers.Suchinformationneedstobebuiltonandincorporatedintonewknowledge,notdis-

cardedinthefaceofaggressiveknow-nothingignorance.Thesystematicrefusaltouseprior

knowledgeandtheassociatedpreferenceforRCTsarerecipesforpreventingcumulativescien-

tificprogress.Intheend,itisalsoself-defeating;toquoteRodrik(2016)“thepromiseofRCTsas

theory-freelearningmachinesisafalseone.”

1.3StatisticalinferenceinRCTs

IfwearetointerprettheresultsofanRCTasdemonstratingthecausaleffectofthetreatment

inthetrialpopulation,wemustbeabletotellwhetherthedifferencebetweenthecontroland

treatmentmeanscouldhavecomeaboutbychance.Anyconclusionaboutcausalityishostage

toourabilitytocalculatestandarderrorsandaccuratep–values.Butthisisnotgenerallypossi-

blewithoutassumptionsthatgobeyondthoseneededtosupportthebasictheoremofRCTs.In

particular,ithaslongbeenknownthatthemean—andafortiorithedifferencebetweentwo

means—isastatisticthatissensitivetooutliers.IndeedBahadurandSavage(1956)demon-

stratethat,withoutrestrictionsontheparentdistributions,standardt–testsareinherentlyun-

reliable.

Thekeyproblemhereisskewness;standardt–testsbreakdownindistributionswith

largeskewness,seeLehmannandRomano(2005,p.466–8).Inconsequence,RCTswillnotwork

wellwhenthedistributionoftheindividualtreatmenteffectsisstronglyasymmetric,atleastif

thestandardtwo-samplet–statistics(orequivalentlyWhite’s(1980)heteroskedasticrobustre-

gressiont–values)areused.Whilewemaybewillingtoassumethattreatmenteffectsaresym-

metricinsomecases,theneedforsuchanassumption—whichrequirespriorknowledgeabout

thespecificprocessbeingstudied—underminestheargumentthatRCTsarelargelyassumption

freeanddonotdependonsuchknowledge.Thereisadeepironyhere.Inthesearchforrobust-

nessandthedesiretodoawaywithunnecessaryassumptions,theRCTcandeliverthemeanof

Page 20: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

19

theATE,yetthemean—asopposedtothemedian,whichcannotbeestimatedbyanRCT—does

notpermitrobustprobabilitystatementsabouttheestimatesoftheATE

Howdifficultisittomaintainsymmetry?Andhowbadlyisinferenceaffectedwhenthe

distributionoftreatmenteffectsisnotsymmetric?Ineconomics,manytrialshaveoutcomes

valuedinmoney.Doesananti-povertyinnovation—forexamplemicrofinance—increasethe

incomesoftheparticipants?Incomeitselfisnotsymmetricallydistributed,andthismightbe

trueofthetreatmenteffectstoo,ifthereareafewpeoplewhoaretalentedbutcredit-

constrainedentrepreneursandwhohavetreatmenteffectsthatarelargeandpositive,while

thevastmajorityofborrowersfritterawaytheirloans,oratbestmakepositivebutmodest

profits.Anotherimportantexampleisexpendituresonhealthcare.Mostpeoplehavezeroex-

penditureinanygivenperiod,butamongthosewhodoincurexpenditures,afewindividuals

spendhugeamountsthataccountforalargeshareofthetotal.Indeed,inthefamousRand

healthexperiment,Manning,Newhouseetal.(1987,1988),thereisasingleverylargeoutlier.

Theauthorsrealizethatthecomparisonofmeansacrosstreatmentarmsisfragile,and,alt-

houghtheydonotseetheirproblemexactlyasdescribedhere,theyobtaintheirpreferredes-

timatesusingastructuralapproachthatisdesignedtoexplicitlymodeltheskewnessofexpendi-

tures.

Insomecases,itwillbeappropriatetodealwithoutliersbytrimming,eliminatingob-

servationsthathavelargeeffectsontheestimates.Butiftheexperimentisaprojectevaluation

designedtoestimatethenetbenefitsofapolicy,theeliminationofgenuineoutliers,asinthe

RandHealthExperiment,willvitiatetheanalysis.Itispreciselytheoutliersthatmakeorbreak

theprogram.

1.3.1Spuriousstatisticalsignificance:anillustrativeexample

Weconsideranexamplethatillustrateswhatcanhappeninarealisticbutsimplifiedcase.There

isaparentpopulation,orpopulationofinterest,definedasthecollectionofunitsforwhichwe

wouldliketoestimateanaveragetreatmenteffect.ItmightbeallvillagesinIndia,orallrecipi-

entsoffoodsubsidies,orallusersofhealthcareintheUS.Fromthispopulationwehaveasam-

plethatisavailableforrandomization,thetrialorexperimentalsample;inarandomizedcon-

trolledtrial,thiswillsubsequentlyberandomlydividedintotreatmentsandcontrols.Ideally,

thetrialsamplewouldberandomlyselectedfromtheparentsample,sothatthesampleaver-

agetreatmenteffectwouldbeanunbiasedestimatorofthepopulationaveragetreatmentef-

fect;indeedinsomecasesthecompletepopulationofinterestisavailableforthetrial.Clearly,

Page 21: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

20

intheseidealcases,itisstraightforwardtousestandardsamplingtheorytogeneralizethetrial

resultsfromthesampletothepopulation.However,foranumberofpracticalandconceptual

reasons,thetrialsampleisrarelyeitherthewholepopulationorarandomlyselectedsubset,

seeShadishetal(2002,pp.341–8)foragooddiscussionofbothpracticalandtheoreticalobsta-

cles.

Inourillustrativeexample,thereisparentpopulationeachmemberofwhichhashisor

herowntreatmenteffect;thesearecontinuouslydistributedwithashiftedlognormaldistribu-

tionwithzeromeansothatthepopulationaveragetreatmenteffectiszero.Theindividual

treatmenteffectsβ aredistributedsothat β + e0.5 ∼ Λ(0,1) ,forstandardizedlognormaldis-

tributionΛ. Wehavesomethinglikeamicrofinancetrialinmind,wherethereisalongpositive

tailofrareindividualswhocandoamazingthingswithcredit,whilemostpeoplecannotuseit

effectively.Atrial(experimental)sampleof2n individualsisrandomlydrawnfromtheparent

andisrandomlysplitbetweenntreatmentsandncontrols.Intheabsenceoftreatment,every-

oneinthesamplerecordszero,sothesampleaveragetreatmenteffectinanyonetrialissimply

themeanoutcomeamongthentreatments.Forvaluesofnequalto25,50,100,200,and500

wedraw100trial/experimentalsampleseachofsize2n;withfivevaluesofn,thisgivesus500

trial/experimentalsamplesinall.Foreachofthese500samples,werandomizeintoncontrols

andntreatments,estimatetheATEanditsestimatedt–value(usingthestandardtwo-samplet–

value,orequivalently,byrunningaregressionwithrobustt–values),andthenrepeat1,000

times,sowehave1,000ATEestimatesandt–valuesforeachofthe500trialsamples;theseal-

lowustoassessthedistributionofATEestimatesandtheirnominalt–valuesforeachtrial.

Table1:RCTswithskewedtreatmenteffects

Samplesize MeanofATE

estimates

Meanofnominalt–

values

Fractionnullreject-

ed(percent)

25

50

0.0268

0.0266

–0.4274

–0.2952

13.54

11.20

100 –0.0018 –0.2600 8.71

200 0.0184 –0.1748 7.09

500 –0.0024 –0.1362 6.06

Page 22: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

21

Note:1,000randomizationsoneachof100drawsofthetrialsamplerandomlydrawnfromalognormaldistributionoftreatmenteffectsshiftedtohaveazeromean.

TheresultsareshowninTable1.Eachrowcorrespondstoasamplesize.Ineachrow,

weshowtheresultsof100,000individualtrials,composedof1,000replicationsoneachofthe

100trial(experimental)samples.Thecolumnsareaveragedoverall100,000trials.

Thelastcolumnshowsthefractionsoftimesthetruenullisrejectedandisthekeyre-

sult.Whenthereareonly50treatmentsand50controls(row2),the(true)nullisrejected11.2

percentofthetime,insteadofthe5percentthatwewouldlikeandexpectifwewereunaware

oftheproblem.Whenthereare500unitsineacharm,therejectionrateis6.06percent,much

closertothenominal5percent.

Whydoesthestandardapplicationofthet–distributiongivesuchstrangeresultswhen

allwearedoingisestimatingamean?Theproblemcasesarewhenthetrialsamplehappensto

containoneormoreoutliers,somethingthatisalwaysariskgiventhelongpositivetailofthe

parentdistribution.Whenthishappens,everythingdependsonwhethertheoutlierisamong

thetreatmentsorthecontrols;ineffecttheoutliersbecomethesample,reducingtheeffective

numberofdegreesoffreedom.

Figure1:EstimatesofanATEwithanoutlierinthetrialsample

Figure1illustratestheestimatedaveragetreatmenteffectsfromanextremecasefrom

thesimulationswith100observationsintotal,thesecondrowofTable1;thehistogramshows

the1,000estimatesoftheATE.Thetrialsamplehasasinglelargeoutlyingtreatmenteffectof

0.5

11.

5D

ensi

ty

-.5 0 .5 1 1.5 21,000 estimates of average treatment effect

Page 23: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

22

48.3;themean(s.d.)oftheother99observationsis–0.51(2.1);whentheoutlierisinthe

treatmentgroup,wegettheright-handsideofthefigure,whenitisnot,wegettheleft-hand

side.Ontheright-handside,whentheoutlierisamongthetreatmentgroup,thedispersion

acrossoutcomesislarge,asistheestimatedstandarderror,andsothoseoutcomesrarelyreject

thenullusingthestandardtableoft–values.Theover-rejectionscomefromtheleft-handside

ofthefigurewhentheoutlierisinthecontrolgroup,theoutcomesarenotsodispersed,and

thet–valuescanbelarge,negative,andsignificant.Whilethesecasesofbimodaldistributions

maynotbecommon,anddependonlargeoutliers,theyillustratetheprocessthatgenerates

theover-rejectionsandspurioussignificance.

Wecouldescapetheseproblemsifwecouldcalculatethemediantreatmenteffect,but

RCTscannot(withoutfurtherassumption)identifythemedian,onlythemean,anditisthe

meanthatisatriskbecauseoftheBahadur-Savagetheorem.Notetoothatthereisonlymoder-

atecomforttobetakeninlargesamplesizes.Whilethelastrowiscertainlybetterthantheoth-

ers,therearestillmanytrialsamplesthataregoingtogivesampleaverageeffectsthataresig-

nificant,evenwhenthenumberwewantiszero.TheproofoftheBahadur-Savagetheorem

worksbynotingthatforanysamplesize,itisalwayspossibletofindanoutlierthatwillgivea

misleadingt–value.NoristhereanescapeherebyusingtheFisherexactmethodforinference;

theFishermethodteststhenullhypothesisthatallofthetreatmenteffectsarezerowhereas

whatweareinterestedinhere,atleastifwewanttodoprojectevaluationorcost-benefitanal-

ysis,isthattheaveragetreatmenteffectiszero.

Theproblemsillustratedabove,thatstemfromtheBahadur-Savagetheorem,arecer-

tainlynotconfinedtoRCTs,andoccurmoregenerallyineconometricandstatisticalwork.How-

ever,theanalysishereillustratesthatthesimplicityofidealRCTs,subtractingonemeanfrom

another,bringsnoexemptionfromtroublesomeproblemsofinference.Escapefromtheseis-

sues,asintheRandHealthExperiment,requiresexplicitmodeling,ormightbebesthandledby

estimatingquantilesofthetreatmentdistribution,whichagainrequiresadditionalassumptions.

OurreadingoftheliteratureonRCTsindevelopmentsuggeststhattheyarenotexempt

fromtheseconcerns.Manydevelopmenttrialsarerunon(sometimesvery)smallsamples,they

havetreatmenteffectswhereasymmetryishardtoruleout—especiallywhentheoutcomesare

inmoney—andtheyoftengiveresultsthatarepuzzling,oratleastnoteasilyinterpretedin

termsofeconomictheory.NeitherBanerjeeandDuflo(2012)norKarlanandAppel(2011),who

citemanyRCTs,raiseconcernsaboutmisleadinginference,treatingallresultsassolid.Nodoubt

Page 24: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

23

therearebehaviorsintheworldthatareinconsistentwithstandardeconomics,andsomecan

beexplainedbystandardbiasesinbehavioraleconomics,butitwouldalsobegoodtobesuspi-

ciousofthesignificancetestsbeforeacceptingthatanunexpectedfindingiswellsupportedand

theoryshouldberevised.Replicationofresultsindifferentsettingsmaybehelpful—iftheyare

therightkindofplaces(seeourdiscussioninSection2)—butithardlysolvestheproblemgiven

thattheasymmetrymaybeinthesamedirectionindifferentsettings(andseemslikelytobeso

injustthosesettingsthataresufficientlyliketheoriginaltrialsettingtobeofuseforinference

aboutthetrialpopulation),andthatthe“significant”t–valueswillshowdeparturesfromthe

nullinthesamedirection,thusreplicatingspuriousfindings.

1.2.11:Significancetests:Fisher-Behrens,robustinference,andmultiplehypotheses

Skewnessoftreatmenteffectsisnottheonlythreattoaccuratesignificancetests.Thetwo–

samplet–statisticiscomputedbydividingtheATEbytheestimatedstandarderrorwhose

squareisgivenby

⌢σ 2 =(n1 −1)−1 (Yi −

⌢µ1)2

i∈1∑n1

+(n0 −1)−1 (Yi −

⌢µ0 )2

i∈0∑n0

(5)

where0referstocontrolsand1totreatments,sothatthereare n1 treatmentsand n0 con-

trols,and µ̂1 and µ̂0 arethetwomeans.Ashasbeenlongknown,thist–statisticisnotdistrib-

utedasStudent’stifthetwovariances(treatmentandcontrol)arenotidentical;thisisknown

astheBehrens–Fisherproblem.Inextremecases,whenoneofthevariancesiszero,thet–

statistichaseffectivedegreesoffreedomhalfofthatofthenominaldegreesoffreedom,sothat

thetest-statistichasthickertailsthanallowedfor,andtherewillbetoomanyrejectionswhen

thenullistrue.

Inaremarkablerecentpaper,Young(2016)arguesthatthisproblemgetsmuchworse

whenthetrialresultsareanalyzedbyregressingoutcomesnotonlyonthetreatmentdummy,

butalsoonadditionalcontrols,someofwhichmightinteractwiththetreatmentdummy.Again

theproblemconcernsoutliersincombinationwiththeuseofclusteredorrobuststandarder-

rors.Whenthedesignmatrixissuchthatthemaximalinfluenceislarge,sothatforsomeobser-

vationsoutcomeshavelargeinfluenceontheirownpredictedvalues,thereisareductioninthe

effectivedegreesoffreedomforthet–value(s)oftheaveragetreatmenteffect(s)leadingto

spuriousfindingsofsignificance.

Page 25: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

24

Younglooksat2003regressionsreportedin53RCTpapersintheAmericanEconomic

AssociationjournalsandrecalculatesthesignificanceoftheestimatesusingFisher’srandomiza-

tioninferenceappliedtotheauthors’originaldata;seeagainImbensandWooldridge(2009)for

agoodmodernaccountofFisher’smethod.In30to40percentoftheestimatedtreatmentef-

fectsinindividualequationswithcoefficientsthatarereportedassignificant,hecannotreject

thenullofnoeffect;thefractionofspuriouslysignificantresultsincreasesfurtherwhenhesim-

ultaneouslytestsforallresultsineachpaper.Thesespuriousfindingscomeinpartfromthe

well-knownproblemofmultiple-hypothesistesting,bothwithinregressionswithseveraltreat-

mentsandacrossregressions.Withinregressions,treatmentsarelargelyorthogonal,butau-

thorstendtoemphasizesignificantt–valuesevenwhenthecorrespondingF-testsareinsignifi-

cant.Acrossequations,resultsareoftenstronglycorrelated,sothat,atworst,differentregres-

sionsarereportingvariantsofthesameresult,thusspuriouslyaddingtothe“killcount”ofsig-

nificanteffects.Atthesametime,thepervasivenessofobservationswithhighinfluencegener-

atesspurioussignificanceonitsown.

Oursenseisthattheseissuesarebeingtakenmoreseriouslyinrecentwork,especially

asconcernsmultiplehypothesistesting.YounghimselfisastrongproponentofRCTsingeneral

andbelievesthatrandomizationinferencewillyieldcorrectinferences.Yetrandomizationinfer-

encecanonlytestthenullthatalltreatmenteffectsarezero,thattheexperimentdoesnothing

toanyone,whereasmanyinvestigatorsareinterestedintheweakerhypothesisthattheaver-

agetreatmenteffectiszero.Thissimplymakesmattersworsesincethestrongerhypothesis

impliestheweakerhypothesisandtherearepresumablyundiscoveredcaseswheretheATEis

spuriouslysignificant,evenwhentheFishertestrejectsthatalltreatmenteffectsarezero.Note

thattestingdoesnotalwaysmatchlogic;itispossibletorejectthenullthattheATEiszeroeven

whenwecansimultaneouslyacceptthe(joint)hypothesisthatalltreatmenteffectsarezero;

thisisfamiliarfromOLSregression,whereanF–testcanshowjointinsignificance,evenwhena

t–testofsomelinearcombinationissignificant.

Itisclearthat,asofnow,allreportedsignificancelevelsfromRCTresultsineconomics

shouldbetreatedwithconsiderablecaution.Greatercareaboutskewnessandoutlierswould

help,aswouldgreateruseoftheFishermethodandofproceduresthatdealcorrectlywithmul-

tiplehypothesistesting.Yetifthenullhypothesisisthattheaveragetreatmenteffectiszero,as

inmostprojectevaluation,theFishertestisnotavailable,sothatwecurrentlydonothavea

reliablesetofprocedures.Robustorclusteredstandarderrorsarenecessarytoallowforthe

Page 26: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

25

possibilitythattreatmentchangesvariances,andtheinclusionofcovariatesisnecessarytocon-

trolforimbalanceinfinitesamples.

1.3Blinding

Blindingisrarelypossibleineconomicsorsocialsciencetrials,andthisisoneofthemajordif-

ferencesfrommost(althoughnotall)RCTsinmedicine,whereblindingisstandard,bothfor

thosereceivingthetreatmentandthoseadministeringit.Indeed,theabilitytoblindhasbeen

oneofthekeyargumentsinfavorofrandomization,fromBradford-Hillinthe1950s,see

Chalmers(2003),towelfaretrialstoday,GueronandRolston(2013).Considerfirsttheblinding

ofsubjects.SubjectsinsocialRCTsusuallyknowwhethertheyarereceivingthetreatmentornot

andsocanreacttotheirassignmentinwaysthatcanaffecttheoutcomeotherthanthroughthe

operationofthetreatment;ineconometriclanguage,thisisakintoaviolationofexclusionre-

strictions,orafailureofexogeneity.Intermsof(1),thereisapathwayfromthetreatmentas-

signmenttoanotherunobservedcause,whichwillresultinabiasedATE.Thisisnottoarguein

favorofinstrumentalvariablesoverRCTs,orviceversa,butsimplytonotethat,withoutblind-

ing,RCTsdonotautomaticallysolvetheselectionproblemanymorethanIVestimationauto-

maticallysolvestheselectionproblem.Inbothcases,theexogeneity(exclusionrestriction)ar-

gumentneedstobeexplicitlymadeandjustified.Yettheliteratureineconomicsgivesgreatat-

tentiontothevalidityofexclusionrestrictionsinIVestimation,whiletendingtoshrugoffthe

essentiallyidenticalproblemswithlackofblindinginRCTs.

Notealsothatknowledgeoftheirassignmentmaycausepeopletowanttocrossover

fromtreatmenttocontrol,orviceversa,todropoutoftheprogram,ortochangetheirbehavior

inthetrialdependingontheirassignment.Inextremecases,onlythosemembersofthetrial

samplewhoexpecttobenefitfromthetreatmentwillaccepttreatment.Consider,forexample,

atrialinwhichchildrenarerandomlyallocatedtotwoschoolsthatteachindifferentlanguages,

RussianorEnglish,ashappenedduringthebreakupoftheformerYugoslavia.Thechildren(and

theirparents)knowtheirallocation,andthemoreeducated,wealthier,andless-ideologically

committedparentswhosechildrenareassignedtotheRussian-mediumschoolscan(anddid)

removetheirchildrentoprivateEnglish-mediumschools.Inacomparisonofthosewhoaccept-

edtheirassignments,theeffectsofthelanguageofinstructionwillbedistortedinfavorofthe

Englishschoolsbydifferencesinfamilycharacteristics.Thisisacasewhere,eveniftherandom

numbergeneratorisfullyfunctional,alaterbalancetestwillshowsystematicdifferencesinob-

Page 27: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

26

servablebackgroundcharacteristicsbetweenthetreatmentandcontrolgroups;evenifthebal-

ancetestispassed,theremaystillbeselectiononunobservablesforwhichwecannottest.

Moregenerally,whenpeopleknowtheirallocation,whentheyhaveastakeintheout-

come,andwhenthetreatmenteffectisdifferentfordifferentpeople,thereareincentivesand

opportunitiesforselectioninresponsetotherandomization,andthatselectioncancontami-

natetheestimatedaveragetreatmenteffect,seeHeckman(1997)whomakesthesamepointin

thecontextofinstrumentalvariables.Thosewhowererandomizedbyalotteryintogoingto

Vietnamwillhavedifferenttreatmenteffectsdependingontheirlabormarketprospects,and

thosewithbetterprospectsaremorelikelytoresistthedraft.Asweshallseeinthenextsub-

section,variousstatisticalcorrectionsareavailableforafewoftheselectionproblemsnon-

blindingpresents,butallrelyonthekindofassumptionsthat,whilecommoninobservational

studies,RCTsaredesignedtoavoid.Ourownviewisthatassumptionsandtheuseofprior

knowledgearewhatweneedtomakeprogressinanykindofanalysis,includingRCTswhose

promiseofassumption-freelearningisalwayslikelytobeillusory.

Theremaybeatendencyineconomicstofocusontheselectionbiaseffectsofnon-

blindingbecausesomesolutionsareavailable,butselectionbiasisnottheonlyserioussource

ofbiasinsocialandmedicaltrials.Concernsabouttheplacebo,Pygmalion,Hawthorne,John

Henry,and'teacher/therapist'effectsarewidespreadacrossstudiesofmedicalandsocialinter-

ventions.Thisliteraturearguesthatdoubleblindingshouldbereplacedbyquadrupleblinding;

blindingshouldextendbeyondparticipantsandinvestigatorsandincludethosewhomeasure

outcomesandthosewhoanalyzethedata,allofwhommaybeaffectedbybothconsciousand

unconsciousbias.Theneedforblindinginthosewhoassessoutcomesisparticularlyimportant

inanycaseswhereoutcomesarenotdeterminedbystrictlyprescribedprocedureswhoseappli-

cationistransparentandcheckablebutrequireselementsofjudgment;agoodexampleisther-

apistswhoareaskedtoassesstheextentofdepressioninclinicaltrialsofanti-depressants,see

Kramer(2016).

Thelessonhereisthatblindingmattersandisveryoftenmissing.Thereisnoreasonto

supposethatapoorlyblindedtrialwithrandomassignmenttrumpsbetterblindedstudieswith

alternativeallocationmechanisms,ormatchedstudies.

1.13WhatdoRCTsdoinpractice?

TheexecutionofanRCTwilloftendeviatefromitsdesign.Peoplemaynotaccepttheirassign-

ment,controlsmaymanagetogettreatment,andviceversa,andpeoplemayaccepttheiras-

Page 28: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

27

signment,butdropoutbeforethecompletionofthestudy.Insomedesigns,thetrialworksby

givingpeopleincentivestoparticipate,forexamplebymailingthemavoucherthatgivesthem

subsidizedaccesstoaschoolortoasavingsproduct.Iftheaimistoevaluatethevoucher

schemeitself,nonewissuearises.However,iftheaimistofindoutwhattheeducationorsav-

ingsprogramdoes,andthevoucherissimplyadevicetoinducevariation,muchdependson

whetherornotpeopledecidetousethevoucherwhich,likeattritionandcrossover,issubject

topurposivedecisionsbythesubjectsinducingdifferencesbetweentreatmentsandcontrols.

Everythingdependsonthepurposeofthetrial.Intheexampleabove,wemaywantto

evaluatethevoucherprogram,orwemaywanttofindoutwhatthesavingproductdoesfor

people.Wearesometimesinterestedinestablishingcausality,andsometimesinestimatingan

averagetreatmenteffect;intheeconomicsliterature,somewritersdefineinternalvalidityas

gettingtheATEright,whileothers,followingtheoriginaldefinitionoftheterm,defineinternal

validityasgettingcausalityright.Sometimesthetriallimitsitselftoestablishingcausality(orto

estimatinganATE)inonlythetrialsample,butsometrialsaremoreambitious,andtrytoestab-

lishcausality(orestimateanATE)forabroaderpopulationofinterest.When,asiscommonin

economicstrials,nolimitsareplacedontheheterogeneityoftreatmentresponses,different

trialsamplesanddifferentpopulationswillgenerallyhavedifferentATEsandmayhavedifferent

casualoutcomes,e.g.ifthetreatmenthasaneffectinonepopulationbutnoneortheopposite

effectinanother.Ourviewisthatthetargetofthetrial,includingthepopulationofinterest,

needstobedefinedinadvance.Otherwise,almostanyestimatednumbercanbeinterpretedas

avalidATEforsomepopulation,weallowdeviationsfromthedesigntodefineourtarget,and

wehavenowayofknowingwhetherapparentlycontradictoryresultsarereallycontradictoryor

arecorrectforthepopulationonwhichtheywerederived.Differencesinresults,betweendif-

ferentRCTsandbetweenRCTsandobservationalstudies,mayowelesstotheselectioneffects

thatRCTsaredesignedtoremove,thantothefactthatwearecomparingnon-comparablepeo-

ple,Heckman,Lalonde,andSmith(1999,p.2082).Withoutaclearideaofhowtocharacterize

thepopulationofindividualsinthetrial,whetherwearelookingforanATEortoidentifycausal-

ity,andforwhichgroupsenrolledinthetrialtheresultsaresupposedtohold,wehavenobasis

forthinkingabouthowtousethetrialresultsinothercontexts.

Toillustratesomeoftheissues,considerasimpleRCTinwhichatreatmentTisadminis-

teredtoatrialsamplethatissplitbetweenatreatmentgroupofsizenandacontrolgroupof

sizen,butthatonlyafractionpofthetreatmentgroupacceptstheirassignment,withfraction

Page 29: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

28

(1− p) receivingnotreatment.SupposethattheparameterofinterestistheATEintheoriginal

population,fromwhichthetrialsamplewasdrawnrandomly.Denotebyβ thehypothetical

idealATEestimatethatwouldhavebeencalculatedifeveryonehadacceptedassignment;aswe

haveseen,thisisanunbiasedestimatoroftheparameterofinterestforboththetrialsample

andtheparentpopulation.β cannotbecalculated,buttherearevariousoptions.

Optiononeistoignoretheoriginalassignmentandcalculatethedifferenceinmeans

betweenthosewhoreceivedthetreatmentandthosewhodidnot,includingamongthelatter

thosewhowereintendedtoreceiveitbutdidnot.Denotethis(“astreated”)estimateβ1. Al-

ternatively,optiontwo,istocomparetheaverageoutcomeamongthosewhowereintendedto

betreatedandthosewhowereintendedtobecontrols.Denotethisestimate,the“intentto

treat”(ITT)estimator,β2. Itiseasytoshowthatonesetofconditionsforβ1 = β isthatthose

whoweretreatedhavethesameATEasthosewhowereintendedtobetreated,andthatthose

whobroketheirassignmenthavethesameuntreatedmeanasthosewhowereassignedtobe

controls,conditionsthatmayholdinsomeapplications,forexamplewherethetreatmentef-

fectsareidentical.

TheITTestimator,β2 ,willtypicallybeclosertozerothanisβ ,anditwillcertainlybe

soiftheaveragetreatmenteffectamongthosewhobreaktheirassignmentisthesameasthe

overallATE,inwhichcaseβ2 = pβ.Forthesereasons,theITTisoftendescribedasyieldinga

conservativeestimateandisroutinelyadvocatedinmedicaltrialseventhoughitisanattenuat-

edestimatoroftheATE.Athirdestimator,β3 ,thelocalaveragetreatmentestimator(LATE)is

computedbyrunningaregressionofoutcomesonan(actual)treatmentdummyusingthe

treatmentassignmentasaninstrumentalvariable.Inthiscase,theLATEissimplytheITT,scaled

upbythereciprocalofp,sothatβ3 = β2 / p. Fromtheabove,theLATEisβ iftheaverage

treatmenteffectofthosewhobreaktheirassignmentisthesameastheaveragetreatmentef-

fectingeneral,sothattheITTestimatorisbiaseddownbycountingthosewhoshouldhave

beentreatedasiftheywerecontrols.Moregenerally,andwithadditionalassumptions,Imbens

andAngrist(1994)showthattheLATEistheaveragetreatmenteffectamongthosewhowere

inducedtoacceptthetreatmentbytheirassignmenttotreatmentstatus,whichcanbeavery

differentobjectfromtheoriginaltargetofinvestigation.Thesevariousestimators,theATE,the

ITT,andtheLATE,areallaveragesoverdifferentgroups;moreformally,HeckmanandVytlacil

(2005)defineamarginaltreatmenteffect(MTE)astheATEforthoseonthemarginoftreat-

Page 30: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

29

ment—whatevertheassignmentmechanism—andshowthattheotherestimatorscanbe

thoughtofasaveragesoftheMTEsoverdifferentpopulations.

Ingeneral,andunlesswearepreparedtosaymoreabouttheheterogeneityinthe

treatmenteffects,thethreeestimatorswillgivedifferentresultsbecausetheyareaveragesover

differentpopulations.Economiststendtobelievethatpeopleactintheirowninterest,atleast

inpart,soitisnotattractivetobelievethatthosewhobreaktheirassignmentshavethesame

distributionoftreatmenteffectsasdothosewhoacceptthem.InHeckman’s(1992)analogy,

peoplearenotlikeagriculturalplots,whichareinnopositiontoevadethetreatmentwhenthey

seeitcoming.Suchpurposivebehaviorwillgenerallyalsoaffectthecompositionofthetrial

samplecomparedwiththeparentpopulation,withthosewhoagreetoparticipatedifferent

fromthosewhodonot.Forexample,peoplemaydislikerandomizationbecauseoftherisksit

entails,orpeoplemayseektoentertrialsinthehopethattheywillreceiveabeneficialtreat-

mentthatisotherwiseunavailable.AfamousexampleineconomicsistheAshenfelter(1978)

pre-program“dip,”wherethosewhoentertrialsoftrainingprogramstendtobethosewhose

earningshavefallenimmediatelypriortoenrolment,seealsoHeckmanandSmith(1999).Peo-

plewhoparticipateindrugtrialsaremorelikelytobesickthanthosewhodonot,orarelikely

tobethosewhohavefailedonstandardmedication.AnotherexampleisChyn’s(2016)evidence

thatthosewhoappliedforvouchersintheMovingtoOpportunityexperimentandwerethus

eligibleforrandomization—andonlyaquarterofthosewhowereeligibleactuallydidso—were

thosewhowerealreadymakingunusualeffortsontheirchildren’sbehalf.Theseparentshad

effectivelysubstitutedforpartofthebetterenvironment,sothattheATEfromthetrialunder-

statesthebenefitstotheaveragechildofmoving.Similarphenomenaoccurinmedicine.Inthe

1954trialsoftheSalkpoliovaccineintheUS,theratesofinfection,whilelowestamongthe

treatedchildren,werehigherinthecontrolchildrenthaninthegeneralpopulationatrisk,so

thattheparentsofthosewhoselectedintothetrialpresumablyhadsomeideathattheymight

havebeenexposed,HausmanandWise(1985,p.193–4).Inthiscase,theaveragetreatment

effectinthetrialsampleexaggeratestheATEinthegeneralpopulation,whichiswhatwewant

toknowforpublicpolicy.

Giventhenon-parametricspiritofRCTs,andtheunwillingnessofmanytrialiststomake

assumptionsortoincorporatepriorinformation,theonlywayforwardistobeveryclearabout

thepurposeofthetrialand,inparticular,whichaveragewearetryingtoestimate.Forthose

whofocusoninternalvalidityintermsofestablishingcausalitybyfindinganATEsignificantly

Page 31: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

30

differentfromzero,thedefinitionofthepopulationseemstobeasecondaryconcern.Theidea

seemstobethatifcausalityisestablishedinsomepopulation,thatfindingisimportantinitself,

withthetaskofexploringitsapplicabilitytootherpopulationsleftasasecondarymatter.For

themanyeconomicorcost–benefitanalyseswheretheATEistheparameterofinterest,the

populationofinterestisdefinitional,andtheinferenceneedstofocusonapathfromtheresults

ofthetrialtotheparameterofinterest.Thisisoftendifficultorevenimpossiblewithoutaddi-

tionalassumptionsand/ormodelingofbehavior,includingthedecisiontoparticipateinthetri-

al,andamongparticipants,thedecisionnottodropout.Manski(1990,1995,2003)hasshown

that,withoutadditionalevidence,thepopulationATEisnot(point)identifiedfromthetrialre-

sults,andhasdevelopednon-parametricbounds(anintervalestimate)fortheATE.Aswiththe

ITT,theseboundsaresometimestightenoughtobeinformative,thoughtheintervaldefinedby

theboundswilloftencontainzero,seeManski(2013)foradiscussionaimedatabroadaudi-

ence.Facedwiththis,manyscholarsarepreparedtomakeassumptionsortobuildmodelsthat

givemorepreciseresults.

RCTsmaytellusaboutcausality,evenwhentheydonotdeliveragoodestimateofthe

ATE.Forexample,iftheITTestimateissignificantlydifferentfromzero,thetreatmenthasa

causaleffectforatleastsomeindividualsinthepopulation.ThesameistrueiftheLATEissignif-

icantlydifferentfromzero;againthetreatmentiscausalforsomesub-population,evenifwe

mayhavedifficultycharacterizingitoracceptingitasthepopulationofinterest.Fromthis,we

alsolearnthat,providedwehadapopulationwiththerightdistributionofβi 's andgoverned

bythesamepotentialoutcomeequation,thetreatmentwouldproducetheeffectinatleast

someindividualsthere.

Section2:Usingtheresultsofrandomizedcontrolledtrials

2.1Introduction

Supposewehavetheresultsofawell-conductedRCT.Wehaveestimatedanaveragetreatment

effect,andourstandarderrorgivesusreasontobelievethattheeffectdidnotcomeaboutby

chance.Wethushavegoodwarrantthatthetreatmentcausestheeffectinoursamplepopula-

tion,uptothelimitsofstatisticalinference.Whataresuchfindingsgoodfor?Howshouldwe

usethem?

Theliteratureineconomics,asindeedinmedicineandinsocialpolicy,haspaidmoreat-

tentiontoobtainingresultsthantowhetherandhowtheyshouldbeadaptedforuse,oftenas-

Page 32: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

31

sumingthatfindingscanbeused“asis.”Mucheffortisdevotedtodemonstratingcausalityand

estimatingeffectsizesinstudypopulations,bothinempiricalwork—moreandbetterRCTs,or

substitutesforRCTs,suchasinstrumentalvariablesorregressiondiscontinuitymodels—aswell

asintheoreticalstatisticalwork—forexampleontheconditionsunderwhichwecanestimate

anaveragetreatmenteffect,oralocalaveragetreatmenteffect,andwhattheseestimates

mean.Thereislesstheoreticalorempiricalworktoguideushowandforwhatpurposestouse

thefindingsofRCTs,suchastheconditionsunderwhichthesameresultsholdoutsideofthe

originalsettings,howtheymightbeadaptedforuseelsewhere,orhowtheymightbeusedfor

formulating,testing,understanding,orprobinghypothesesbeyondtheimmediaterelationbe-

tweenthetreatmentandtheoutcomeinvestigatedinthestudy.

Yetitcannotbethatknowinghowtouseresultsislessimportantthanknowinghowto

demonstratethem.Anychainofevidenceisonlyasstrongasitweakestlink,sothatarigorously

establishedeffectwhoseapplicabilityisjustifiedbyaloosedeclarationofsimilewarrantslittle

morethananestimatethatwaspluckedoutofthinair.Iftrialsaretobeuseful,weneedpaths

totheirusethatareascarefullyconstructedasarethetrialsthemselves.

Itissometimesassumedthataparameter,oncewellestablished,isinvariantacrossset-

tings.Theparametermaybedifficulttoestimate,becauseofselectionorotherissues,andit

maybethatonlyawell-conductedRCTcanprovideacredibleestimateofit.Ifso,internalvalidi-

tyisallthatisrequired,anddebateaboutusingtheresultsbecomesadebateabouttheconduct

ofthestudy.Theargumentforthe“primacyofinternalvalidity,”Shadish,Cook,andCampbell

(2002),isreasonableasawarningthatbadRCTsareunlikelytogeneralize,butitissometimes

incorrectlytakentoimplythatresultsofaninternallyvalidtrialwillautomaticallyoroftenapply

‘asis’elsewhere,orthatthisisthedefaultassumptionfailingargumentstothecontrary.Anin-

varianceargumentisoftenmadeinmedicine,whereitissometimesplausiblethataparticular

procedureordrugworksthesamewayeverywhere,thoughseeHorton(2000)forastrongdis-

sentandRothwell(2005)forexamplesonbothsidesofthequestion.Weshouldalsonotethe

recentmovementtoensurethattestingofdrugsincludeswomenandminoritiesbecausemem-

bersofthosegroupssupposethattheresultsoftrialsonmostlyhealthyyoungwhitemalesdo

notapplytothem.

2.2Usingresults,transportability,andexternalvalidity

Supposeatrialhasestablishedaresultinaspecificsetting,andweareinterestedinusingthe

resultoutsidetheoriginalcontext.If“thesame”resultholdselsewhere,wesaywehaveexter-

Page 33: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

32

nalvalidity,otherwisenot.Externalvaliditymayreferjusttothetransportabilityofthecausal

connection,orgofurtherandrequirereplicationofthemagnitudeoftheaveragetreatment

effect.Eitherway,theresultholds—everywhere,orwidely,orinsomespecificelsewhere—orit

doesnot.

Thisbinaryconceptofexternalvalidityisoftenunhelpful;itbothoverstatesandunder-

statesthevalueoftheresultsfromanRCT.Itdirectsustowardsimpleextrapolation—whether

thesameresultwillholdelsewhere—orsimplegeneralization—whetheritholdsuniversallyor

atleastwidely—andawayfrompossiblymorecomplexbutmoreusefulapplicationsoftheevi-

dence.Justasinternalvaliditysaysnothingaboutwhetherornotatrialresultwillholdelse-

where,thefailureofexternalvalidityinterpretedassimplegeneralizationorextrapolationsays

littleaboutthevalueofthetrial.

First,thereareseveralusesofRCTsthatdonotrequiretransportabilitybeyondtheorig-

inalcontext;wediscusstheseinthenextsubsection.Second,thereareoftengoodreasonsto

expectthattheresultsfromawell-conducted,informative,andpotentiallyusefulRCTwillnot

applyelsewhereinanysimpleway.Evensuccessfulreplicationbyitselftellsuslittleeitherforor

againstsimplegeneralizationorextrapolation.Withoutfurtherunderstandingandanalysis,

evenmultiplereplicationscannotprovidemuchsupportfor,letaloneguarantee,theconclusion

thatthenextwillworkinthesameway.Nordofailuresofreplicationmaketheoriginalresult

useless.Wecanoftenlearnmuchfromcomingtounderstandwhyreplicationfailedanduse

thatknowledgetomakeappropriateuseoftheoriginalfindings,notbyexpectingreplication,

butbylookingforhowthefactorsthatcausedtheoriginalresultmightbeexpectedtooperate

differentlyindifferentsettings.Third,andparticularlyimportantforscientificprogress,theRCT

resultcanbeincorporatedintoanetworkofevidenceandhypothesesthattestorexplore

claimsthatlookverydifferentfromtheresultsreportedfromtheRCT.Weshallgiveexamples

belowofextremelyusefulRCTsthatarenotexternallyvalidinthe(usual)sensethattheirre-

sultsdonotholdelsewhere,whetherinaspecifictargetsettingorinthemoresweepingsense

ofholdingeverywhere.

BertrandRussell’schickenprovidesanexcellentexampleofthelimitationstostraight-

forwardextrapolationfromrepeatedsuccessfulreplication.Thebirdinfers,basedonmultiply

repeatedevidence,thatwhenthefarmercomesinthemorning,hefeedsher.Theinference

servesherwelluntilChristmasmorning,whenhewringsherneckandservesherforChristmas

dinner.Ofcourse,ourchickendidnotbaseherinferenceonanRCT.Buthadweconstructed

Page 34: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

33

oneforher,wewouldhaveobtainedexactlythesameresultthatshedid.Herproblemwasnot

hermethodology,butratherthatshewasstudyingsurfacerelations,andthatshedidnotun-

derstandthesocialandeconomicstructurethatgaverisetothecausalrelationsthatsheob-

served.Soshedidnotknowhowwidelyorhowlongtheywouldobtain.Russellnotes,“more

refinedviewsastotheuniformityofnaturewouldhavebeenusefultothechicken”(1912,p.

44).Weoftenactasifthemethodsofinvestigationthatservedthechickensobadlywilldoper-

fectlywellforus.

Establishingcausalitydoesnothinginandofitselftoguaranteegeneralizability.Nor

doestheabilityofanidealRCTtoeliminatebiasfromselectionorfromomittedvariablesmean

thattheresultingATEwillapplyanywhereelse.Theissueisworthmentioningonlybecauseof

theenormousweightthatiscurrentlyattachedineconomicstothediscoveryandlabelingof

causalrelations,aweightthatishardtojustifyforeffectsthatmayhaveonlylocalapplicability,

whatmight(perhapsprovocatively)belabeled‘anecdotalcausality’.Theoperationofacause

generallyrequiresthepresenceofsupportorhelpingfactors,withoutwhichacausethatpro-

ducesthetargetedeffectinoneplace,eventhoughitmaybepresentandhavethecapacityto

operateelsewhere,willremainlatentandinoperative.WhatMackie(1974)calledINUScausality

(InsufficientbutNon-redundantpartsofaconditionthatisitselfUnnecessarybutSufficientfora

contributiontotheoutcome)isoftenthekindofcausalitywesee;astandardexampleisa

houseburningdownbecausethetelevisionwaslefton,althoughtelevisionsdonotoperatein

thiswaywithouthelpingfactors,suchaswiringfaults,thepresenceoftinder,andsoon.Thisis

standardfareinepidemiology,whichusestheterm“causalpie”torefertothecasewhereaset

ofcausesarejointlybutnotseparatelysufficientforaneffect.Ifwerewrite(3)intheform

Yi = βiTi + γ j xij = θk wik

k=1

K

∑⎛⎝⎜⎞⎠⎟

Ti +j=1

J

∑ γ j xijj=1

J

∑ (6)

where θk controlshow wik affectsindividualI’streatmenteffect βi . The“helping”or“support”

factorsforthetreatmentarerepresentedbytheinteractivevariables wik , amongwhichmaybe

includedsomex’s.SincetheATEistheaverageofthe βi 's ,twopopulationswillhavethesame

ATEonlyif,exceptbyaccident,theyhavethesameaverageforthesupportfactorsnecessary

forthetreatmenttowork.Thesearehoweverjustthekindoffactorsthatarelikelytobediffer-

entlydistributedindifferentpopulations,andindeedwedogenerallyfinddifferentATEsindif-

Page 35: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

34

ferentdevelopment(andothersocialpolicy)RCTsindifferentplaceseveninthecaseswhere

(unusually)theyallpointinthesamedirection.

Causalprocessesoftenrequirehighlyspecializedeconomic,cultural,orsocialstructures

toenablethemtowork.ConsidertheRubeGoldbergmachinethatisriggedupsothatflyinga

kitesharpensapencil,CartwrightandHardie(2012,77),oranotherwherealongchainofropes

andpulleyscausestheinsertionoffoodintothemouthtoactivateaface-wipingnapkin.These

arecausalmachines,buttheyarespeciallyconstructedtogiveakindofcausalitythatoperates

extremelylocallyandhasnogeneralapplicability.Theunderlyingstructureaffordsaveryspecif-

icformof(6)thatwillnotdescribecausalprocesseselsewhere.NeitherthesameATEnorthe

samequalitativecausalrelationscanbeexpectedtoholdwherethespecificformfor(6)isdif-

ferent.

Indeed,wecontinuallyattempttodesignsystemsthatwillgeneratecausalrelations

thatwelikeandthatwillruleoutcausalrelationsthatwedonotlike.Healthcaresystemsare

designedtopreventnursesanddoctorsmakingerrors;carsaredesignedsothatdriverscannot

starttheminreverse;workschedulesforpilotsaredesignedsotheydonotflytoomanycon-

secutivehourswithoutrestbecausealertnessandperformancearecompromised.

AsintheRubeGoldbergmachinesandinthedesignofcarsandworkschedules,the

economicstructureandequilibriummaydifferinwaysthatsupportdifferentkindsofcausal

relationsandthusrenderatrialinonesettinguselessinanother.Forexample,atrialthatrelies

onprovidingincentivesforpersonalpromotionisofnouseinastateinwhichapoliticalsystem

lockspeopleintotheirsocialandeconomicpositions.Conditionalcashtransferscannotimprove

childhealthintheabsenceoffunctioningclinics.Policiestargetedatmenmaynotworkfor

women.Weusealevertotoastourbread,butleversonlyoperatetotoastbreadinatoaster;

wecannotbrowntoastbypressinganaccelerator,eveniftheprincipleoftheleveristhesame

inbothatoasterandacar.Ifwemisunderstandthesetting,ifwedonotunderstandwhythe

treatmentinourRCTworks,werunthesamerisksasRussell’schicken.

2.3WhenRCTsspeakforthemselves:notransportabilityrequired

Forsomethingswewanttolearn,anRCTisenoughbyitself.AnRCTmaydisproveageneral

theoreticalpropositiontowhichitprovidesacounterexample.Thetestmightbeofthegeneral

propositionitself(asimplerefutationtest),orofsomeconsequenceofitthatissusceptibleto

testingusinganRCT(acomplexrefutationtest).Ofcourse,counterexamplesareoftenchal-

lenged—forexample,itisnotthegeneralpropositionthatcausedtherejection,butaspecial

Page 36: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

35

featureofthetrial—buthereweareonfamiliarinferentialturf.AnRCTmayalsoconfirmapre-

dictionofatheory,andalthoughthisdoesnotconfirmthetheory,itisevidenceinitsfavor,es-

peciallyifthepredictionseemsinherentlyunlikelyinadvance.Onceagain,thisisfamiliarterri-

tory,andthereisnothinguniqueaboutanRCT;itissimplyoneamongmanypossibletesting

procedures.Evenwhenthereisnotheory,orveryweaktheory,anRCT,bydemonstratingcau-

salityinsomepopulationcanbethoughtofasproofofconcept,thatthetreatmentiscapableof

workingsomewhere.Thisisoneoftheargumentsfortheimportanceofinternalvalidity.

AnothercasewherenotransportationiscalledforiswhenanRCTisusedforevaluation,

forexampletosatisfydonorsthattheprojecttheyfundedactuallyachieveditsaimsinthepop-

ulationinwhichitwasconducted.Evenso,forsuchevaluations,saybytheWorldBank,tobe

globalpublicgoodsrequiresthedevelopmentofargumentsandguidelinesthatjustifyusingthe

resultsinsomewayelsewhere;theglobalpublicgoodisnotanautomaticby-productofthe

Bankfulfillingitsfiduciaryresponsibility.Whenthecomponentsoftreatmentschangeacross

studies,evaluationsneednotleadtocumulativeknowledge.OrasHeckmanetal(1999,p.1934)

note,“thedataproducedfromthem[socialexperiments]arefarfromidealforestimatingthe

structuralparametersofbehavioralmodels.Thismakesitdifficulttogeneralizefindingsacross

experimentsortouseexperimentstoidentifythepolicy-invariantstructuralparametersthat

arerequiredforeconometricpolicyevaluation.”Ofcourse,whenweaskexactlywhatthosein-

variantstructuralparametersare,whethertheyexist,andhowtheyshouldbemodeled,we

openupmajorfaultlinesinmodernappliedeconomics.Forexample,wedonotintendtoen-

dorseintertemporaldynamicmodelsofbehaviorastheonlywayofrecoveringtheparameters

thatweneed.Wealsorecognizethattheusefulnessofsimplepricetheoryisnotasuniversally

acceptedasitoncewas.Butthepointremainsthatweneedsomething,someregularity,and

thatthesomethingneededcanrarelyberecoveredbysimplygeneralizingacrosstrials.

Athirdnon-problematicandimportantuseofanRCTiswhentheparameterofinterest

istheaveragetreatmenteffectinawell-definedpopulationfromwhichthesampletrialpopula-

tion—fromwhichtreatmentsandcontrolsarerandomlyassigned—isitselfarandomsample.In

thiscasethesampleaveragetreatmenteffect(SATE)isanunbiasedestimatorofthepopulation

averagetreatmenteffect(PATE)that,byassumption,isourtarget,seeImbens(2004)forthese

terms.Werefertothisasthe“publichealth”case;likemanypublichealthinterventions,the

targetistheaverage,“populationhealth,”notthehealthofindividuals.Onemajor(andwidely

recognized)dangerofthepublic-health-styleusesofRCTsisthatthescalingupfrom(evena

Page 37: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

36

random)sampletothepopulationwillnotgothroughinanysimplewayiftheoutcomesofindi-

vidualsorgroupsofindividualschangethebehaviorofothers—whichwillbecommonineco-

nomicexamplesbutperhapslesscommoninhealth.Thereisalsoanissueoftimingiftheresults

aretobeimplementedsometimeafterthetrial.

Ineconomics,a‘public-health-style’exampleistheimpositionofacommoditytax,

wherethetotaltaxrevenueisofinterestandwedonotcarewhopaysthetax.Indeed,theory

canoftenidentifyaspecific,well-definedmagnitudewhosemeasurementiskeyforthepolicy;

seeDeatonandNg(1998)foranexampleofwhatChetty(2009)callsa“sufficient”statistic.In

thiscase,thebehaviorofarandomsampleofindividualsmightwellprovideagoodguidetothe

taxrevenuethatcanbeexpected.Anothercasecomesfromworkonpovertyprogramswhere

theinterestofthesponsorsisintheconsequencesforthebudgetofthestateresponsiblefor

theprogram;wediscussthesecasesattheendofthisSection.Evenhere,itiseasytoimagine

behavioraleffectscomingintoplaythatdriveawedgebetweenthetrialanditsfullscaleim-

plementation,forexampleifcomplianceishigherwhentheschemeiswidelypublicized,orif

governmentagenciesimplementtheschemedifferentlyfromtrialists.

2.4Transportingresultslaterallyandglobally

TheprogramofRCTsindevelopmenteconomics,asinotherareasofsocialscience,hasthe

broadergoaloffindingout“whatworks.”Atitsmostambitious,thisaimsforuniversalreach,

andthedevelopmentliteraturefrequentlyarguesthat“credibleimpactevaluationsareglobal

publicgoodsinthesensethattheycanofferreliableguidancetointernationalorganizations,

governments,donors,andnongovernmentalorganizations(NGOs)beyondnationalborders,”

KremerandDuflo(2008,p.93).SometimestheresultsofasingleRCTareadvocatedashaving

wideapplicability,withespeciallystrongendorsementwhenthereisatleastonereplication.

Forexample,KremerandHolla(2009)useaKenyantrialasthebasisforablanketstatement

withoutcontextrestriction,“Provisionoffreeschooluniforms,forexample,leadsto10%-15%

reductionsinteenpregnancyanddropoutrates.”KremerandDuflo(2008),writingaboutan-

othertrial,aremorecautious,citingtwoevaluations,andrestrictingthemselvestoIndia:“One

canberelativelyconfidentaboutrecommendingthescaling-upofthisprogram,atleastinIndia,

onthebasisoftheseestimates,sincetheprogramwascontinuedforaperiodoftime,waseval-

uatedintwodifferentcontexts,andhasshownitsabilitytoberolledoutonalargescale.”

Ofcourse,theproblemofgeneralizationextendsbeyondRCTs,toboth“fullycon-

trolled”laboratoryexperimentsandtomostnon-experimentalfindings.Forexample,eversince

Page 38: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

37

AlfredMarshallthoughtofitwhilesunbathing,economistshaveusedtheconceptofanelastici-

ty—asintheincomeelasticityofthedemandforfood,orthepriceelasticityofthesupplyof

cotton—andhavetransportedelasticities—whichareconvenientlydimensionless—fromone

contexttoanother,asnumericalestimates,orinranges,suchashigh,medium,orlow.Articles

thatcollectsuchestimatesarewidelycitedeventhough,ashaslongbeenknown,theinvari-

anceofelasticitiesisnotguaranteedinpracticeandissometimesinconsistentwithchoicetheo-

ry.OurargumenthereisthatevidencefromRCTs,likeevidenceonelasticities,isnotautomati-

callysimplygeneralizable,andthatitsinternalvalidity,whenitexists,doesnotprovideitwith

anyuniqueinvarianceacrosscontext.WeshallalsoarguethatspecificfeaturesofRCTs,suchas

theirfreedomfromparametricassumptions,althoughadvantageousinestimation,canbease-

rioushandicapinuse.

MostadvocatesofRCTsunderstandthat“whatworks”needstobequalifiedto“what

worksunderwhichcircumstances,”andtrytosaysomethingaboutwhatthosecircumstances

mightbe,forexample,byreplicatingRCTsindifferentplaces,andthinkingintelligentlyabout

thedifferencesinoutcomeswhentheyfindthem.Sometimesthisisdoneinasystematicway,

forexamplebyhavingmultipletreatmentswithinthesametrialsothatitispossibletoestimate

a“responsesurface,”thatlinksoutcomestovariouscombinationsoftreatments,seeGreenberg

andSchroder(2004)orShadishetal(2002).Forexample,theRANDhealthexperimenthadmul-

tipletreatments,allowinginvestigation,notonlyofwhetherhealthinsuranceincreasedexpend-

itures,buthowmuchitdidsounderdifferentcircumstances.Someofthenegativeincometax

experiments(NITs)inthe1960sand1970sweredesignedtoestimateresponsesurfaces,with

thenumberoftreatmentsandcontrolsineacharmoptimizedtomaximizeprecisionofestimat-

edresponsefunctionssubjecttoanoverallcostlimit,Conlisk(1973).Experimentsontime-of-

daypricingforelectricityhadasimilarstructure,seeAigner(1985).

TheMDRCexperimentshavealsobeenanalyzedacrosscitiesinanefforttolinkcityfea-

turestotheresultsoftheRCTswithinthem,Bloom,Hill,andRiccio(2005).UnliketheRANDand

NITexamples,theseareexpostanalysesofcompletedtrials;thesameistrueofVivalt(2015)

whoassemblesevidenceonalargenumberoftrials,andfinds,forthecollectionoftrialsshe

studied,thatdevelopment-relatedRCTsrunbygovernmentagenciestypicallyfindsmaller

(standardized)effectsizesthanRCTsrunbyacademicsorbyNGOs.Boldetal(2013),whoran

parallelRCTsonaninterventionimplementedeitherbyanNGOorbythegovernmentofKenya,

foundsimilarresultsthere.Notethattheseanalyseshaveadifferentpurposefromthosemeta-

Page 39: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

38

analysesthatassumethatdifferenttrialsestimatethesameparameteruptonoiseandaverage

inordertoincreaseprecision.

Althoughthereareissueswithallofthesemethodsofinvestigatingdifferencesacross

trials,withoutsomedisciplineitistooeasytocomeupwith“just-so”orfairystoriesthatac-

countforalmostanydifferences.Weriskaprocedurethat,ifaresultisreplicatedinfullorin

partinatleasttwoplaces,putsthattreatmentintothe“itworks”boxand,iftheresultdoesnot

replicate,causallyinterpretsthedifferenceinawaythatallowsatleastsomeofthefindingsto

survive.

Howcanwethinkaboutthismoreseriously?Howcanwedobetterthansimplegener-

alizationandsimpleextrapolation?Manywritershaveemphasizedtheroleoftheoryintrans-

portingandusingtheresultsoftrials,andweshalldiscussthisfurtherinthenextsubsection.

Butstatisticalapproachesarealsowidelyused;thesearedesignedtodealwiththepossibility

thattreatmenteffectsvarysystematicallywithothervariables.Referringbackto(6),suppose

thattheβi 's ,theindividualtreatmenteffects,arefunctionsofasetofKobservableorunob-

servablesupportvariables,wik ,andthatthenon-vacuousw’smayevenrepresentdifferentfea-

turesindifferentplaces.Itisthenclearthat,providedthedistributionofthewvaluesisthe

sameinthenewcircumstancesastheold,thentheATEintheoriginaltrialwillholdinthenew

circumstances.Ingeneral,ofcourse,thisconditionwillnothold,nordowehaveanyobvious

wayofcheckingitunlessweknowwhatthesupportfactorsareinbothplaces.

Oneproceduretodealwithinteractionsispost-experimentalstratification,whichparal-

lelspost-surveystratificationinsamplesurveys.Thetrialisbrokenupintosubgroupsthathave

thesamecombinationofknown,observablew’s,theATEswithineachofthesubgroupscalcu-

lated,andthenreassembledaccordingtotheconfigurationofw’sinthenewcontext.Forex-

ample,ifthetreatmenteffectsvarywithage,theage-specificATEscanbeestimated,andthe

agedistributioninthenewcontextusedtoreweighttheage-specificATEstogiveanew,overall,

ATE.ThiscanbeusedtoestimatetheATEinanewcontext,ortocorrectestimatestothepar-

entpopulationwhenthetrialsampleisnotarandomsampleoftheparent.Ofcourse,this

methodwillonlyworkinspecialcases;forexample,ifweonlyknowsomeofthew’s,thereisno

reasontosupposethatreweightingforthosealonewillgiveausefulcorrection.

Othermethodsalsoworkwhentherearetoomanyw’sforstratification,forexampleby

estimatingtheprobabilityofeachobservationinthepopulationbeingincludedinthetrialsam-

pleasafunctionofthew’s,thenweightingeachobservationbytheinverseofthesepropensity

Page 40: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

39

scores.AgoodreferenceforthesemethodsisStuartetal(2011),orineconomics,Angrist

(2004)andHotz,Imbens,andMortimer(2005).

Thereareyetfurtherreasonswhythesemethodsdonotalwayswork.Aswithanyform

ofreweighting,thevariablesusedtoconstructtheweightsmustbepresentinboththeoriginal

andnewcontext.Iftreatmenteffectsvarybysex,wecannotpredicttheoutcomesformenus-

ingatrialsamplethatisentirelyfemale.Ifwearetocarryaresultforwardintime,wemaynot

beabletoextrapolatefromaperiodoflowinflationtoaperiodofhighinflation;asHotzetal

(2005)note,itwilltypicallybenecessarytoruleoutsuch“macro”effects,whetherovertime,or

overlocations.Italsodependsonassumingthatthesamegoverningequation(6)coversthe

trialandthetargetpopulation.Iftheydiffernotonlybywhatcausalfactorsarepresentinwhat

proportionsbutalsoinhow(ifatall)thecausescontributetotheeffects,re-weightingtheeffect

sizesthatoccurintrialsub-populationswillnotproducegoodpredictionsabouttargetpopula-

tionoutcomes.

Itshouldbeclearfromthisthatreweightingworksonlywhentheobservablefactors

usedforreweightingincludeallandonlygenuineinteractivecauses;weneeddataonallthe

relevantinteractivefactors.ButasMuller(2015)notes,thistakesusbacktothesituationthat

RCTsaredesignedtoavoid,whereweneedtostartfromacompleteandcorrectspecificationof

thecausalstructure.RCTscanavoidthisinestimation—whichisoneoftheirstrengths,support-

ingtheircredibility—butthebenefitvanishesassoonaswetrytocarrytheirresultstoanew

context.

PearlandBareinboim(2014)usePearl’sdo–calculustoprovideafullerformalanalysis

fortransportabilityofcausalempiricalfindingsacrosspopulations.Theydefinetransportability

as“alicensetotransfercausaleffectslearnedinRCTstoanewpopulation,inwhichonlyobser-

vationalstudiescanbeconducted,”PearlandBareinboim(2015,p.1).Theyconsiderbothquali-

tativecausalrelations,whichtheyrepresentindirectedacyclicgraphs,andprobabilisticfacts,

suchastheconditionalprobabilityoftheoutcomeonatreatmentconditionalonsomethird

factor.Theythenprovidetheoremsaboutwhattherelationshipbetweenthecausalandproba-

bilisticfactsintwopopulationsmustbeifitistobepossibletoinferaparticularcausalfact,

suchastheATE,aboutpopulation2fromcausalandprobabilisticinformationaboutpopulation

1coupledwithpurelyprobabilisticinformationaboutpopulation2.Notsurprisingly,formany

thingsweshouldliketoknowaboutpopulation2,knowledgeofeventhefullstructureonpopu-

lation1willnotsuffice.Inferencestofactsaboutanewpopulationrequirenotonlythatthe

Page 41: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

40

factswesupposeaboutpopulation1—likeanATE—arewellgrounded,thattheRCTwaswell

conducted,thatthestatisticalinferenceissound—butthatwehaveequallygoodgroundingfor

otherassumptionsweneedabouttherelationbetweenthetwopopulations.Forexample,using

theresultdescribedabovefordirectlytransportingtheATEfromatrialpopulationtosomeoth-

er—simpleextrapolation—weneedgoodgroundstosupposeboththattheaverageofthenet

effectoftheinteractivefactorsisthesameinbothpopulationsandalsothatthesamegovern-

ingequationdescribesbothpopulations.

Thisdiscussionleadstoanumberofpoints.First,wecannotgettogeneralclaimsby

simplegeneralization;thereisnowarrantfortheconvenientassumptionthattheATEestimated

inaspecificRCTisaninvariantparameter.Weneedtothinkthroughthecausalchainthathas

generatedtheRCTresult,andtheunderlyingstructuresthatsupportthiscausalchain,whether

thatcausalchainmightoperateinanewsettingandhowitwoulddosowithdifferentjointdis-

tributionsofthecausalvariables;weneedtoknowwhyandwhetherthatwhywillapplyelse-

where.Whileitistruethatthereexistgeneralcausalclaims—theforceofgravity,orthatpeople

respondtoincentives—theyuserelativelyabstractconceptsandoperateatamuchhigherlevel

thantheclaimsthatcanbereasonablyinferredfromatypicalRCT,andcannot,bythemselves,

guaranteetheoutcomesthatweareconsideringhere.Thattransportationisfarfromautomatic

alsotellsuswhy(evenideal)RCTsofsimilarinterventionscanbeexpectedtogivedifferentan-

swersindifferentsettings.Suchdifferencesdonotnecessarilyreflectmethodologicalfailings

andwillholdacrossperfectlyexecutedRCTsjustastheydoacrossobservationalstudies.

Second,thoughtfulpre-experimentalstratificationinRCTsislikelytobevaluable,or

failingthat,subgroupanalysis,becauseitcanprovideinformationthatmaybeusefulforgener-

alizationortransportation.Forexample,KremerandHolla(2009)notethat,intheirtrials,

schoolattendanceissurprisinglysensitivetosmallsubsidies,whichtheysuggestisbecause

therearealargenumberofstudentsandparentswhoareonthe(financial)marginbetween

attendingandnotattendingschool;ifthisisindeedthemechanismfortheirresults,agoodvar-

iableforstratificationwouldbethefractionofpeopleneartherelevantcutoff.Wealsoneedto

knowthatthesamemechanismworksinanynewsettingwhereweconsiderusingsmallsubsi-

diestoincreaseschoolattendance.

Third,weneedtobeexplicitaboutcausalstructure,evenifthatmeansmoremodel

buildingandmore—ordifferent—assumptionsthanadvocatesofRCTsareoftencomfortable

with.Tobeclear,modelingcausalstructuredoesnotnecessarilycommitustotheelaborateand

Page 42: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

41

oftenincredibleassumptionsthatcharacterizesomestructuralmodelingineconomics,but

thereisnoescapefromthinkingaboutthewaythingswork,thewhyaswellasthewhat.

Fourth,wewilltypicallyneedtoknowmorethantheresultsoftheRCTitself,forexam-

pleaboutdifferencesinsocial,economic,andculturalstructuresandaboutthejointdistribu-

tionsofcausalvariables,knowledgethatwilloftenonlybeavailablethrougharangeofempiri-

calstrategiesincludingobservationalstudies.Wewillalsoneedtobeabletocharacterizethe

populationtowhichtheoriginalRCTanditsATEappliedbecausehowthepopulationisde-

scribediscommonlytakentobesomeindicationofwhichotherpopulationstheresultsarelike-

lytobeexportabletoandwhichnot.Manymedicalandpsychologicaljournalsareexplicitabout

this.Forinstance,therulesforsubmissionrecommendedbytheInternationalCommitteeof

MedicalJournalEditors,ICMJE(2015,p14)insistthatarticleabstracts“Clearlydescribethese-

lectionofobservationalorexperimentalparticipants(healthyindividualsorpatients,including

controls),includingeligibilityandexclusioncriteriaandadescriptionofthesourcepopulation.”

Theproblemsofcharacterizingthepopulationheregoesbeyondthosewefacedinconsidering

aLATE.AnRCTisconductedonapopulationofspecificindividuals.Theresultsobtained,

whetherwethinkintermsofanATEorintermsofestablishingcausality,arefeaturesofthat

population,ofthoseveryindividualsatthatverytime,notanyotherpopulationwithanydiffer-

entindividualsthatmight,forexample,satisfyoneoftheinfinitesetofdescriptionsthatthe

trialpopulationsatisfies.Howisthedescriptionofthepopulationthatisusedinreportingthe

resultstobechosen?Forchoosewemust—thealternativetodescribingisnaming,identifying

eachindividualinthestudybyname,whichiscumbersomeandunhelpfulandoftenunethical.

Thissameissueisconfrontedalreadyinstudydesign.Apartfromspecialcases,likepost

hocevaluationforpayment-for-results,wearenotespeciallyconcernedtolearnaboutthevery

populationenrolledinthetrial.Mostexperimentsare,andshouldbe,conductedwithaneyeto

whattheresultscanhelpuslearnaboutotherpopulations.Thiscannotbedonewithoutsignifi-

cantsubstantialassumptionsaboutwhatmightbeandwhatmightnotberelevanttothepro-

ductionoftheoutcomestudied.(Forexample,theICMJEguidelinesgoontosay:“Becausethe

relevanceofsuchvariablesasage,sex,orethnicityisnotalwaysknownatthetimeofstudyde-

sign,researchersshouldaimforinclusionofrepresentativepopulationsintoallstudytypesand

ataminimumprovidedescriptivedatafortheseandotherrelevantdemographicvariables,”

p14.)Sobothintelligentstudydesignandresponsiblereportingofstudyresultsinvolvesubstan-

tialbackgroundassumptions.Ofcoursethisistrueforallstudies,notjustRCTs.ButRCTsrequire

Page 43: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

42

specialconditionsiftheyaretobeconductedatallandespeciallyiftheyaretobeconducted

successfully—localagreements,compliantsubjects,affordableadministrators,peoplecompe-

tenttomeasureandrecordoutcomesreliably,asettingwhererandomallocationismorallyand

politicallyacceptable,etc.,whereasobservationaldataareoftenmorereadilyandwidelyavail-

able.InthecaseofRCTs,thereisdangerthatthesekindsofconsiderationshavetoomuchef-

fect.Thisisespeciallyworrisomewherethefeaturesthestudypopulationshouldhavearenot

justified,madeexplicit,orsubjectedtoseriouscriticalreview.Thiscarefuldescriptionofthe

studypopulationisuncommonineconomics,whetherinRCTsormanyobservationalstudies.

Theneedforobservationalknowledgeisoneofmanyreasonswhyitiscounter-

productivetoinsistthatRCTsaretheuniquegoldstandard,orthatsomecategoriesofevidence

shouldbeprioritizedoverothers;thesestrategiesleaveushelplessinusingRCTsbeyondtheir

originalcontext.TheresultsofRCTsmustbeintegratedwithotherknowledge,includingthe

practicalwisdomofpolicymakers,iftheyaretobeuseableoutsidethecontextinwhichthey

wereconstructed.Contrarytomuchpracticeinmedicineaswellasineconomics,conflictsbe-

tweenRCTsandobservationalresultsneedtobeexplained,forexamplebyreferencetothedif-

ferentpopulationsineach,aprocessthatwillsometimesyieldimportantevidence,includingon

therangeofapplicabilityoftheRCTitself.WhilethevalidityoftheRCTwillsometimesprovide

anunderstandingofwhytheobservationalstudyfoundadifferentanswer,thereisnobasis(or

excuse)forthecommonpracticeofdismissingtheobservationalstudysimplybecauseitwas

notanRCTandthereforemustbeinvalid.Itisabasictenetofscientificadvancethatnewfind-

ingsmustbeabletoexplainpreviousresults,evenresultsthatarenowthoughttobeinvalid;

methodologicalprejudiceisnotanexplanation.

Theseconsiderationscanbeseeninpracticeintherangeofrandomizedcontrolledtrials

ineconomics,whichweshallexploreinthefinalsubsectionbelow.

2.5Usingtheoryforgeneralization

Economistshavebeencombiningtheoryandrandomizedcontrolledtrialssincetheearlyexper-

iments.OrcuttandOrcutt(1968)laidouttheinspirationfortheincometaxtrialsusingasimple,

statictheoryoflaborsupply.Accordingtothis,peoplechoosehowtodividetheirtimebetween

workandleisureinanenvironmentinwhichtheyreceiveaminimumGiftheydonotwork,and

wheretheyreceiveanadditionalamount (1− t)w foreachhourtheywork,wherewisthe

wagerate,andtisataxrate.ThetrialsassigneddifferentcombinationsofGandttodifferent

trialgroups,sothattheresultstracedoutthelaborsupplyfunction,allowingestimationofthe

Page 44: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

43

parametersofpreferences,whichcouldthenbeusedinawiderangeofpolicycalculations,for

exampletoraiserevenueatminimumutilitylosstoworkers.

Followingtheseearlytrials,therehasbeenalongandcontinuingtraditionofusingtrial

results,togetherwiththebaselinedatacollectedforthetrial,tofitstructuralmodelsthatareto

beusedmoregenerally.EarlyexamplesincludeMoffitt(1979)onlaborsupplyandWise(1985)

onhousing;morerecentexamplesareHeckman,PintoandSavelyev(2013)forthePerrypre-

schoolprogram.DevelopmenteconomicsexamplesincludeAttanasio,MeghirandSantiago

(2012),Attanasioetal(2015),ToddandWolpin(2006)andDuflo,HannaandRyan(2012).The-

sestructuralmodelssometimesrequireformidableauxiliaryassumptionsonfunctionalformsor

thedistributionsofunobservables,whichmakesmanyeconomistsreluctanttoembracethem,

buttheyhavecompensatingadvantages,includingtheabilitytointegratetheoryandevidence,

tomakeout-of-samplepredictions,andtoanalyzewelfare—whichalwaysrequiressomeunder-

standingofwhythingshappen—andtheuseofRCTevidenceallowstherelaxationofatleast

someoftheassumptionsthatareneededforidentification.Inthisway,thestructuralmodels

borrowcredibilityfromtheRCTsandinreturnhelpsettheRCTresultswithinacoherent

framework.Withoutsomesuchinterpretation,thewelfareimplicationsofRCTresultscanbe

problematic;knowinghowpeopleingeneral(letalonejustpeopleinthetrialpopulation,which

iswhat,aswekeeprepeating,thetrialresultstellusabout)respondtosomepolicyisrarely

enoughtotellwhetherornottheyaremadebetteroff.Whatworksisnotequivalenttowhat

shouldbe.

Inmanypapers,Heckmanhasdevelopedwaystomodelhowthebeliefsandinterestsof

participantsaffecttheirparticipationin,behaviorduring,andtheiroutcomesintrials,forexam-

pleusingaRoymodelofchoice;seee.g.HeckmanandSmith(1995),andmorerecently

Chassang,PadróIMiguel,andSnowberg(2012)andChassangetal(2015).Themodelingofbe-

liefsandbehaviorallowspredictionsabouttheresultsoftrialsthatdifferfromthebasetrial,or

wheretheriskandrewardstructuresaredifferent.Beyondthat,andinlinewitharunning

themeofthisSection,thinkingabouthowtohandlenewsituationscanbeincorporatedintothe

designoftheoriginaltrialsoastoprovidetheinformationneededfortransportation.

LighttouchtheorycandomuchtoextendandtouseRCTresults.InboththeRAND

HealthExperimentandnegativeincometaxexperiments,animmediateissueconcernedthe

differencebetweenshortandlong-runresponses;indeed,differencesbetweenimmediateand

ultimateeffectsoccurinawiderangeofRCTs.BothhealthandtaxRCTsaimedtodiscoverwhat

Page 45: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

44

wouldhappenifconsumers/workerswerepermanentlyfacedwithhigherorlowerpric-

es/wages,butthetrialscouldonlyrunforalimitedperiod.Atemporarilyhightaxrateonearn-

ingswaseffectivelya“firesale”onleisure,sothattheexperimentprovidedanopportunityto

takeavacationandmakeuptheearningslater,anincentivethatwouldbeabsentinaperma-

nentscheme.Howdowegetfromtheshort-runresponsesthatcomefromthetrialtothelong-

runresponsesthatwewanttoknow?Metcalf(1973)andAshenfelter(1978)providedanswers

fortheincometaxexperiments,asdidArrow(1975)fortheRandHealthExperiment.

Arrow’sanalysisillustrateshowtousebothstructureandobservationaldatato

transportandadaptresultsfromonesettingtoanother.Hemodelsthehealthexperimentasa

two-periodmodel,inwhichthepriceofmedicalcareisloweredinthefirstperiodonly,and

showshowtoderivewhatwewant,whichistheresponseinthefirstperiodifpriceswerelow-

eredbythesameproportioninbothperiods.ThemagnitudethatwewantisS,thecompen-

satedpricederivativeofmedicalcareinperiod1inthefaceofidenticalincreasesin p1 and p2

inbothperiods1and2,andthisisequalto s11 + s12 ,thesumofthederivativesofperiod1’s

demandwithrespecttothetwoprices.Thetrialgivesonly s11 .Butifwehavepost-trialdataon

medicalservicesforbothtreatmentsandcontrols,wecaninfer s21 ,theeffectoftheexperi-

mentalpricemanipulationonpost-experimentalcare.Choicetheory,intheformofSlutsky

symmetry,allowsArrowtousethistoinfer s12 andthusS.HecontraststhiswithMetcalf’sal-

ternativesolution,whichmakesdifferentassumptions—thattwoperiodpreferencesareinter-

temporallyadditive,inwhichcasethelong-runelasticitycanbeobtainedfromknowledgeofthe

incomeelasticityofpost-experimentalmedicalcare,whichwouldhavetocomefromanobser-

vationalanalysis.Thesetwoalternativeapproachesshowhowwecanchoose,basedonourwill-

ingnesstomakeassumptionsandonthedatawehave,asuitablecombinationof(elementary

andtransparent)theoreticalassumptionsandobservationaldatainorderadaptandusethetrial

results.Suchanalysiscanalsohelpdesigntheoriginaltrialbyclarifyingwhatweneedtoknowin

ordertobeabletousetheresultsofatemporarytreatmenttoestimatethepermanenteffects

thatweneed.Ashenfelterprovidesathirdsolution,notingthatthetwoperiodmodelisformally

identicaltoatwopersonmodel,sothatwecanuseinformationontwo-personlaborsupplyto

tellusaboutthedynamics.

Theorycanoftenallowustoreclassifyneworunknownsituationsasanalogoustositua-

tionswherewealreadyhavebackgroundknowledge.Onefrequentlyusefulwayofdoingthisis

Page 46: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

45

whenthenewpolicycanberecastasequivalenttoachangeinthebudgetconstraintthatre-

spondentsface.Theconsequencesofanewpolicymaybeeasiertopredictifwecanreduceit

toequivalentchangesinincomeandprices,whoseeffectsareoftenwellunderstoodandwell

studied.ToddandWolpin(2008)makethispointandprovideexamples.Inthelaborsupply

case,anincreaseinthetaxratethasthesameeffectasadecreaseinthewageratew,sothat

wecanrelyonpreviousliteraturetopredictwhatwillhappenwhentaxratesarechanged.In

thecaseofMexico’sPROGRESAconditionalcashtransferprogram,ToddandWolpinnotethat

thesubsidiespaidtoparentsiftheirchildrengotoschoolcanbethoughtofasacombinationof

reductioninchildren’swageratesandanincreaseinparents’income,whichallowsthemto

predicttheresultsoftheconditionalcashexperimentwithlimitedadditionalassumptions.If

thisworks,asitpartiallydoesintheiranalysis,thetrialhelpsconsolidatepreviousknowledge

andcontributestoanevolvingbodyoftheoryandempirical,includingtrial,evidence.

Theprogramofthinkingaboutpolicychangesasequivalenttopriceandincomechang-

eshasalonghistoryineconomics;muchofrationalchoicetheorycanbesointerpreted,see

DeatonandMuellbauer(1980)formanyexamples.Whenthisconversioniscredible,andwhen

atrialonsomeapparentlyunrelatedtopiccanbemodeledasequivalenttoachangeinprices

andincomes,andwhenwecanassumethatpeopleindifferentsettingsrespondrelevantlysimi-

larlytochangesinpricesandincomes,wehaveareadymadeframeworkforincorporatingthe

trialresultsintopreviousknowledge,aswellasforextendingthetrialresultsandusingthem

elsewhere.Ofcourse,alldependsonthevalidityandcredibilityofthetheory;peoplemaynotin

factthinkofataxincreaseasadecreaseinthepriceofleisure,andbehavioraleconomicsisfull

ofexampleswhereapparentlyequivalentstimuligeneratenon-equivalentoutcomes.Theem-

braceofbehavioraleconomicsbymanyofthecurrentgenerationoftrialistsmayaccountfor

theirlimitedwillingnesstouseconventionalchoicetheoryinthisway;unfortunately,behavioral

economicsdoesnotyetofferareplacementforthegeneralframeworkofchoicetheorythatis

sousefulinthisregard.

Theorycanalsohelpwiththeproblemweraisedofdelineatingthepopulationtowhich

thetrialresultsimmediatelyapplyandforthinkingaboutmovingfromthispopulationtothe

populationofinterest.Ashenfelter’s(1978)analysisisagainagoodillustrationandpredates

muchsimilarworkinlaterliterature.Theincometaxexperimentsofferedparticipationinthe

trialtoarandomsampleofthepopulationofinterest.Becausetherewasnoblindingandno

compulsion,peoplewhowererandomizedintothetreatmentgroupwerefreetochoosetore-

Page 47: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

46

fusetreatment.Asinmanysubsequentanalyses,Ashenfeltersupposesthatpeoplechooseto

participateifitisintheirinteresttodoso,dependingonwhathasbecomeknownintheRCT

andInstrumentalVariablesliteratureastheirownidiosyncratic“gain.”Thesimplelaborsupply

modelgivesanapproximatecondition:ifthetreatmentincreasesthetaxratefrom t0 to t1 with

anoffsettingincreaseinG,thenanindividualassignedtotheexperimentalgroupwilldeclineto

participateif

(t1 − t0 )w0h0 +12s00 (t1 − t0 ) >G1 −G0 (7)

wheresubscript1referstothetreatmentsituation,0tothecontrol,h0 ishoursworked,and

s00 isthe(negative)utility-constantresponseofhoursworkedtothetaxrate.Ifthereisnosub-

stitution,thesecondtermontheleft-handsideiszero,andpeoplewillaccepttreatmentifthe

increaseinGmorethanmakesupfortheincreasesintaxespayable,the“breakeven”condition.

Inconsequence,thosewithhigherearningsarelesslikelytoaccepttreatment.Somebetter-off

peoplewithhighsubstitutioneffectswillalsoaccepttreatmentiftheopportunitytobuymore

cheapleisureissufficiententicement.

Theselectiveacceptanceoftreatmentlimitstheanalyst’sabilitytolearnaboutthebet-

ter-offorlow-substitutionpeoplewhodeclinetreatmentbutwhowouldhavetoacceptitifthe

policywereactuallyimplemented.BoththeITTestimatorandthe“astreated”estimatorthat

comparesthetreatedandtheuntreatedareaffected,notjustbythelaborsupplyeffectsthat

thetrialisdesignedtoinduce,butbythekindofselectioneffectsthatrandomizationisde-

signedtoeliminate.Ofcourse,theanalysisthatleadsto(3)canperhapshelpussaysomething

aboutthisandhelpusadjustthetrialestimatesbacktowhatwewouldliketoknow.Yetthisis

noeasymatterbecauseselectiondepends,notonlyonobservables,suchaspre-experimental

earningsandhoursworked,buton(muchhardertoobserve)laborsupplyresponsesthatlikely

varyacrossindividuals.ParaphrasingAshenfelter,wecannotestimatetheeffectsofaperma-

nentcompulsorynegativeincometaxprogramfromatransitoryvoluntarytrialwithoutstrong

assumptionsoradditionalevidence.

Muchofthemodernliterature,forexampleontrainingprograms,wrestleswiththeis-

sueofexactlywhoisrepresentedbytheRCTresults,seeagainHeckman,LalondeandSmith

(1999).Whenpeopleareallowedtorejecttheirrandomlyassignedtreatmentaccordingtotheir

own(realorperceived)individualadvantage,wehavecomealongwayawayfromtherandom

allocationinthestandardconceptionofarandomizedcontrolledtrial.Moreover,theabsenceof

Page 48: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

47

blindingiscommoninsocialandeconomicRCTs,andwhiletherearetrials,suchaswelfaretri-

als,thateffectivelycompelpeopletoaccepttheirassignments,andsomewherethetreatment

isgenerousenoughtodoso,therearetrialswheresubjectshavemuchfreedomand,inthose

cases,itislessthanobvioustouswhatrole,ifany,randomizationplaysinwarrantingthere-

sults.

2.6Scalingup:usingtheaverageforpopulations

AtypicalRCT,especiallyinthedevelopmentcontext,issmall-scaleandlocal,forexampleina

fewschools,clinics,orfarmsinaparticulargeographic,cultural,socio-economicsetting.Ifsuc-

cessfulaccordingtoacost-effectivenesscriterion,forexample,itisacandidateforscaling-up,

applyingthesameinterventionforamuchlargerarea,oftenawholecountry,orsometimes

evenbeyond,aswhensometreatmentisconsideredforallrelevantWorldBankprojects.The

factthattheinterventionmightworkdifferentlyatscalehaslongbeennotedintheeconomics

literature,e.g.GarfinkelandManski(1992),Heckman(1992),andMoffitt(1992),andisrecog-

nizedintherecentreviewbyBanerjeeandDuflo(2009).Wewantheretoemphasizetheperva-

sivenessofsucheffects—thatfailureofthetrialresultstoreplicateatalargerscaleislikelyto

betheruleratherthantheexception—aswellastonoteonceagainthat,asinfailuresoftrans-

portability,thisshouldnotbetakenasanargumentagainstusingRCTs,butonlyagainsttheidea

thateffectsatscalearelikelytobethesameasinthetrial.UsingRCTresultsisnotthesameas

assumingthesameresultsholdsinallcircumstances.

Anexampleofwhatareoftencalledgeneralequilibriumeffectscomesfromagriculture.

SupposeanRCTdemonstratesthatinthestudypopulationanewwayofusingfertilizerorinsec-

ticidehadasubstantialpositiveeffecton,say,cocoayields,sothatfarmerswhousedthenew

methodssawincreasesinproductionandinincomescomparedtothoseinthecontrolgroup.If

theprocedureisscaleduptothewholecountry,ortoallcocoafarmersworldwide,theprice

willdrop,andifthedemandforcocoaispriceinelastic—asisusuallythoughttobethecase,at

leastintheshortrun—cocoafarmers’incomeswillfall.Indeed,theconventionalwisdomfor

manycropsisthatfarmersdobestwhentheharvestissmall,notlarge.Ofcourse,theseconsid-

erationsmightnotbedecisiveindecidingwhetherornottopromotetheinnovation,andthere

maystillbelongtermgainsif,forexample,somefarmersfindsomethingbettertodothan

growingcocoa.Butthebasicpointisthatthescaled-upeffectinthiscaseisoppositeinsignto

thetrialeffect.Theproblemhereisnotwiththetrialresults,whichcanbeusefullyincorporated

intoamorecomprehensivemarketmodelthatincorporatestheresponsesestimatedbythe

Page 49: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

48

trial.Theproblemisonlyifweassumethattheaggregatelooksliketheindividual.Thatother

ingredientsoftheaggregatemodelmustcomefromobservationalstudiesshouldnotbeacriti-

cism,evenforthosewhofavorRCTs;itissimplythepriceofdoingseriousanalysis.

Therearemanypossibleinterventionsthataltersupplyordemandwhoseeffect,inag-

gregate,willchangeapriceorawagethatisheldconstantintheoriginalRCT.Educationwill

changethesuppliesofskilledversusunskilledlabor,withimplicationsforrelativewagerates.

Conditionalcashtransfersincreasethedemandfor(andperhapssupplyof)schoolsandclinics,

whichwillchangepricesorwaitinglines,orboth.Thereareinteractionsbetweenpeoplethat

willoperateonlyatscale.Givingonechildavouchertogotoprivateschoolmightimproveher

future,butdoingsoforeveryonecandecreasethequalityofeducationforthosechildrenwho

areleftinthepublicschools,seethecontrastingstudiesofAngristetal(1999)andHsiehand

Urquiola(2002).Educationalortrainingprogramsmaybenefitthosewhoaretreated,butharm

thoseleftbehind;ifthecontrolgroupisselectedfromthelatter,theRCTmaygenerateaposi-

tiveresultinspiteofhurtingsomeandhelpingnone;Créponetal(2014)recognizetheissueand

showhowtoadaptanRCTtodealwithit.

Scalingupcanalsodisturbthepoliticalequilibrium.Anexploitativegovernmentmaynot

allowthemasstransferofmoneyfromabroadtoapowerlesssegmentofthepopulation,

thoughitmaypermitasmall-scaleRCTofcashtransfers.Provisionofhealthcarebyforeign

NGOsmaybesuccessfulintrials,buthaveunintendednegativeconsequencestoscalebecause

ofgeneralequilibriumeffectsonthesupplyofhealthcarepersonnel,orbecauseitdisturbsthe

natureofthecontractbetweenthepeopleandagovernmentthatisusingtaxrevenuetopro-

videservices.InIndia,thegovernmentspendslargesumsonfoodsubsidiesthroughasystem

(thePDS)thatisbothcorruptandinefficient,withmuchofthegrainthatisprocuredfailingto

finditswaytotheintendedbeneficiaries.LocalizedRCTsonwhetherornotfamiliesarebetter

offwithcashtransfersarenotinformativeabouthowpoliticianswouldchangetheamountof

thetransferiffacedwithunanticipatedinflation,andatleastasimportant,whetherthegov-

ernmentcouldcutprocurementfromrelativelywealthyandpoliticallypowerfulfarmers.With-

outapoliticalandgeneralequilibriumanalysis,itisimpossibletothinkabouttheeffectsofre-

placingfoodsubsidieswithcashtransfers,seee.g.Basu(2010).

Eveninmedicine,wherebiologicalinteractionsbetweenpeoplearelesscommonthan

aresocialinteractionsinsocialscience,interactionscanbeimportant;infectiousdiseasesarean

example,andimmunizationprogramsaffectthedynamicsofdiseasetransmissionthroughherd

Page 50: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

49

immunity,sothattheeffectsonanindividualdependonhowmanyothersarevaccinated,Fine

andClarkson(1986),Manski(2013,p52).Theusual,ifseldomcorrect,conceptionofanRCTin

medicineisofabiologicalprocess—forexample,theadministrationofaspirinafteraheartat-

tack—wheretheeffectisthoughttobesimilaracrossindividuals,andwheretherearenointer-

actions.Yetevenhere,thesocialandeconomicsettingaffectshowdrugsareactuallyusedand

thesameissuescanarise;thedistinctionbetweenefficacyandeffectivenessinclinicaltrialsisin

partrecognitionofthefact.

2.7Drillingdown:usingtheaverageforindividuals

Justasthereareissueswithscaling-up,itisnotobvioushowtousetheresultsfromRCTsatthe

levelofindividualunits,evenindividualunitsthatwereactually(orpotentially)includedinthe

trial.Awell-conductedRCTdeliversanaveragetreatmenteffectforawell-definedpopulation

but,ingeneral,thataveragedoesnotapplytoeveryone.Itisnottrue,forexample,asarguedin

JAMA’s“Users’guidetothemedicalliterature”that“ifthepatientwouldhavebeenenrolledin

thestudyhadshebeenthere—thatisshemeetsalloftheinclusioncriteriaanddoesn’tviolate

anyoftheexclusioncriteria—thereislittlequestionthattheresultsareapplicable,”Guyattetal

(1994).Evenmoremisleadingaretheoften-heardstatementsthatanRCTwithanaverage

treatmenteffectinsignificantlydifferentfromzerohasshownthatthetreatmentworksforno

one,thoughsuchaconclusionwouldbebettersupportedbyaFisherrandomizationtest.

Theseissuesarefamiliartophysicianspracticingevidence-basedmedicinewhoseguide-

linesrequire“integratingindividualclinicalexpertisewiththebestavailableexternalclinicalevi-

dencefromsystematicresearch,”Sackettetal(1996).Exactlywhatthismeansisunclear;phy-

siciansknowmuchmoreabouttheirpatientsthanisallowedforintheATEfromtheRCT

(though,onceagain,stratificationinthetrialislikelytobehelpful)andtheyoftenhaveintuitive

expertisefromlongpracticethattheyrelyontohelpthemidentifyfeaturesinaparticularpa-

tientthatarelikelytoaffecttheeffectivenessofagiventreatmentforthatpatient.Butthereis

anoddbalancebeingstruckhere.Thesejudgmentsaredeemedadmissibleindealingwiththe

individualpatient,atleastfordiscussionwiththepatientaspossibleconsiderations,butthey

don’tadduptoevidencetobemadepubliclyavailable,withtheusualcautionsaboutcredibility,

bythestandardsadoptedbymostEBMsites.Itisalsotruethatphysicianscanhaveprejudices

and“knowledge”thatmightbeanythingbut.Clearly,therearesituationswhereforcingpracti-

tionerstofollowtheaveragewilldobetter,evenforindividualpatients,andotherswherethe

oppositeistrue,seeKahnemanandKlein(2009).

Page 51: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

50

Whetherornotaveragesareusefultoindividualsraisesthesameissueinsocialscience

research.Imaginetwoschools,StJoseph’sandSt.Mary’s,bothofwhichwereincludedinan

RCTofaclassroominnovation,oratleastwereeligibletobeso.Theinnovationissuccessfulon

average,butshouldtheschoolsadoptit?ShouldStMary’sbeinfluencedbyapreviousattempt

inStJoseph’sthatwasjudgedafailure?Manywoulddismissthisexperienceasanecdotaland

askhowStJoseph’scouldhaveknownthatitwasafailurewithoutbenefitof“rigorous”evi-

dence.YetifStMary’sislikeStJoseph’s,withasimilarmixofpupils,asimilarcurriculum,and

similaracademicstanding,mightnotStJoseph’sexperiencebemorerelevanttowhatmight

happenatStMary’sthanisthepositiveaveragefromtheRCT?Andmightitnotbeagoodidea

fortheteachersandgovernorsofStMary’stogotoStJoseph’sandfindoutwhathappenedand

why?Theymaybeabletoobservethemechanismofthefailure,ifsuchitwas,andfigureout

whetherthesameproblemswouldapplyforthem,orwhethertheymightbeabletoadaptthe

innovationtomakeitworkforthem,perhapsevenmoresuccessfullythanthepositiveaverage

inthetrial.

Onceagain,thesequestionsareunlikelytobesimplyansweredinpractice;but,aswith

transportability,thereisnoseriousalternativetotrying.Assumingthattheaverageworksfor

youwilloftenbewrong,anditwillatleastsometimesbepossibletodobetter.Asinthemedi-

calcase,theadvicetoindividualschoolsoftenlacksspecificity.Forexample,theUSInstituteof

EducationScienceshasprovideda“user-friendly”guidetopracticessupportedbyrigorousevi-

dence,USDepartmentofEducation(2003).Theadvice,whichisverysimilartorecommenda-

tionsindevelopmenteconomics,isthattheinterventionbedemonstratedeffectivethrough

well-designedRCTsinmorethanonesiteofimplementation,andthat“thetrialsshoulddemon-

stratetheintervention’seffectivenessinschoolsettingssimilartoyours”(2003,p.17).Nooper-

ationaldefinitionof“similar”isprovided.

Wenotefinallythatthesecaveats,whichapplytoindividuals(orschools)evenifthey

wereinthetrial,provideanotherreasonwhytheconceptof“external”validityisunhelpful.The

realissueishowtousethefindingsofatrialinnewsettings,includingsettingsincludedinthe

trial;externalvalidityinthesenseofinvarianceoftheATEemphasizessimplereplication,which

guaranteesnothing,whileignoringthepossibilitythatlackofreplicationcanbeakeytounder-

standing.

Page 52: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

51

2.8Examplesandillustrationsfromeconomics

OurargumentsinthisSectionshouldnotbecontroversial,yetwebelievethattheyrepresentan

approachthatisdifferentfrommostcurrentpractice.Todocumentthisandtofilloutthear-

guments,weprovidesomeexamples.Whiletheseareoccasionallycritical,ourpurposeiscon-

structive;indeed,webelievethatmisunderstandingsabouthowtouseRCTshaveartificially

limitedtheirusefulness,aswellasalienatedsomewhowouldotherwiseusethem.

Conditionalcashtransfers(CCTs)areinterventionsthathavebeentestedusingRCTs

(andotherRCT-likemethods)andareoftencitedasaleadingexampleofhowanevaluation

withstronginternalvalidityleadstoarapidspreadofthepolicy,e.g.AngristandPischke(2010)

amongmanyothers.IThinkthroughthecausalchainthatisrequiredforCCTstobesuccessful:

peoplemustlikemoney,theymustlike(ordonotobjecttoomuch)totheirchildrenbeingedu-

catedandvaccinated,theremustexistschoolsandclinicsthatarecloseenoughandwell

enoughstaffedtodotheirjob,andthegovernmentoragencythatisrunningtheschememust

careaboutthewellbeingoffamiliesandtheirchildren.Thatsuchconditionsholdinawide

rangeof(althoughcertainlynotall)countriesmakesitunsurprisingthatCCTs“work”inmany

replications,thoughtheycertainlywillnotworkinplaceswheretheschoolsandclinicsdonot

exist,Levy(2001),norinplaceswherepeoplestronglyopposeeducationorvaccination.

Similarly,giventhatthehelpingfactorswilloperatewithdifferentstrengthsandeffec-

tivenessindifferentplaces,itisalsonotsurprisingthatthesizeoftheATEdiffersfromplaceto

place;forexample,Vivalt’sAidGradewebsitelists29estimatesfromarangeofcountriesofthe

standardized(dividedbylocalstandarddeviationoftheoutcome)effectsofconditionalcash

transfersonschoolattendance;allbutfourshowtheexpectedpositiveeffect,andtherange

runsfrom–8to+38percentagepoints.Eveninthisleadingcase,wherewemightreasonably

concludethatCCTs“work”ingettingchildrenintoschool,itwouldbehardtocalculatecredible

cost-effectivenessnumbers,ortocometoageneralconclusionaboutwhetherCCTsaremoreor

lesscosteffectivethanotherpossiblepolicies.Bothcostsandeffectsizescanbeexpectedto

differinnewsettings,justastheyhaveinobservedones,makingthesepredictionsdifficult.

Therangeofestimatesillustratesthatthesimpleviewofexternalvalidity—thattheATE

shouldtransportfromoneplacetoanother—isnotwelldefined.AidGradeusesstandardized

measuresofeffectsizedividedbystandarddeviationofoutcomeatbaseline,asdoesthemajor

multi-countrystudybyBanerjeeetal(2015),Butwemightprefermeasuresthathaveaneco-

nomicinterpretation,suchasadditionalmonthsofschoolingper$100spent(forexampleifa

Page 53: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

52

donoristryingtodecidewheretospend,seebelow).Nutritionmightbemeasuredbyheight,or

bythelogofheight.EveniftheATEbyonemeasurecarriesacross,itwillonlydosousingan-

othermeasureiftherelationshipbetweenthetwomeasuresisthesameinbothsituations.This

isexactlythesortofthingthataformalanalysisoftransportabilityforcesustothinkabout.

(NotealsothatATEintheoriginalRCTcandifferdependingonwhethertheoutcomeismeas-

uredinlevelsorinlogs;thetwoATEscouldevenhavedifferentsigns.)

Dewormingissurelymorecomplicatedthanconditionalcashtransfersthoughnotbe-

causeanyonedisputesthedesirabilityofremovingparasiticalwormsorthebiologicalefficacyof

themedicines,atleastiftheyarerepeatedlyandeffectivelyadministered;thatisthepartofthe

causalprocessthatistransportablefromoneplacetoanother.Yetnutritionalorschoolattend-

anceoutcomesdependonreinfectionfromonepersontoanother—whichdependsonlocal

customsaboutdefecation(whichvaryfromplacetoplaceandaresubjecttoreligiousandcul-

turalfactors),particularlyontheextentofopendefecationandthedensityofpopulation,on

whetherornotchildrenwearshoes,andontheavailabilityanduseofpublicandprivatesanita-

tion;thislastwascrucialintheeliminationofhookworminthesouthernstatesoftheU.S.ac-

cordingtoStiles(1939).Temperaturemayalsobeimportant;indeed,such“macro”variablesare

likelytobeimportantinawiderangeofmedical,employment,andproductiontrials,

RosenzweigandUdry(2016).Therearetwoprominentpositivestudiesintheeconomicslitera-

ture,oneinKenya,KremerandMiguel(2000)andoneinIndia,Bobonis,MiguelandPuri-

Sharma(2006);theseareoftencitedasexamplesofthepowerofRCTstocomeupwiththe

“right”answer,forexamplebyKarlanandAppel(2008).YettheCochraneCollaborationreview

ofdewormingandschooling,Taylor-Robinsonetal(2015),whichreviewsonetrial(fromIndia)

coveringmorethanamillionparticipants,and44otherscovering67,672participants,including

KremerandMiguel(2004),concludethatthereis“substantialevidence”thatdewormingshows

nobenefitinnutritionalstatus,hemoglobin,cognition,schoolperformanceordeath.Thevalidi-

tyofthismeta-analysisisdisputedbyCrokeetal(2016).Areplication,Aikenetal(2015)andre-

analysis(usingdifferentmethods)ofMiguelandKremer’soriginaldatabyDaveyetal(2015)

concludedthatthestudy“providedsomeevidence,butwithhighriskofbias,”provokinga

lengthyexchange,Hicksetal(2015)andHargreavesetal(2015).Mostofthedifferencesinre-

sultscomefromdifferentmethodologicalchoices,themselveslargelybasedondisciplinarytra-

ditions,ratherfromtheeffectsofmistakesorerrors.Inanimpressiveandclearreanalysis,

Humphreys(2015)arguesthatonepuzzlingfeatureofMiguelandKremer’sresultsistheab-

Page 54: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

53

senceofanycleareffectofdewormingonhealth,aswasthecaseinthelargeIndianRCT.Yet

theeffectsofdewormingoneducation,whicharethemaintargetofthepaper,presumably

workthroughhealth,sothattheabsenceofhealtheffects—afailureofexpectedmediators—is

apuzzle,seealsoMiguel,KremerandHicks(2015),andAhujaetal(2015).Recalltooourearlier

discussionofthedifficultyofinterpretingthestandarderrorsoftheoriginalstudyintheab-

senceofrandomization.

Itisnotourpurposeheretotrytoadjudicatethesecompetingclaimsbutrathertore-

latethisworktoourgeneralargument.First,itisnotclearthatthereisarightanswertobedis-

covered;giventhecausalchainsinvolved,dewormingmightbehelpfulinoneplacebutunhelp-

fulinanother.Yetthefocusofthedebateisalmostentirelyoninternalvalidity,onwhetherthe

originalstudieswerecorrectlydone.TheCochranereview,inlinewiththis,andinlinewith

muchmeta-analysisoftrials,seemstosupposethatthereisasingleeffecttobeuncoveredthat,

onceestablished,willbeinvarianttolocalandenvironmentaldifferences.Externalvalidity,it

seems,isimpliedbyinternalvalidity.Indeed,Chalmers,oneofthefoundersoftheCochrane

Collaboration,hasexplicitlyargued(inresponsetooneofus)that,intheabsenceofstrongrea-

sonstothecontrary,resultsshouldbetakenasapplicableeverywhere,PettigrewandChalmers

(2011).

Second,thedebatemakesitclearthatthepracticeofRCTsineconomicdevelopment

hasdonelittletofulfilltheoriginalpromisethattheirsimplicity—howhardisittosubtractone

meanfromanother?—woulddisposeofthemethodologicalandeconometricdisputesthat

characterizesomanyobservationalstudiesandwerethoughttobeoneoftheirmainflaws.

WhileRCTstendtotakesomecontentiousissuesofidentificationoffthetable,theyleavemuch

tobedisputed,includingthehandlingoffactorsthatinteractwithtreatmenteffects,theappro-

priatelevelofrandomization,thecalculationofstandarderrors,thechoiceofoutcomemeas-

ure,theinclusioncriteriaforthesample,placeboandHawthorneeffects,andmuchmore.The

claimthatRCTscutthroughtheusualeconometricdisputestodelivertopolicymakersasimple,

convincing,andeasilyunderstoodanswerissimplyfalse.Thedewormingdebatesareperhaps

theleadingillustration.

Muchofthedevelopmentliterature,likethemedicalliterature,workswiththeviewof

externalvaliditythat,unlessthereisevidencetothecontrary,thedirectionandsizeoftreat-

menteffectscanbetransportedfromoneplacetoanother.TheJ-PALwebsitereportsitsfind-

ingsunderageneralheadofpolicyrelevance,subdividedbyaselectionoftopics.Undereach

Page 55: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

54

topic,thereisalistofrelevantRCTsfromarangeofdifferentsettingsaroundtheworld.These

areconvenientlyconvertedintoacommoncost-effectivenessmeasuresothat,forexample,

under‘education’,subhead‘studentparticipation’,therearefourstudiesfromAfrica:onin-

formingparentsaboutthereturnstoeducationinMadagascar,ondeworming,onschooluni-

forms,andonmeritscholarships,allfromKenya.Theunitsofmeasurementareadditionalyears

ofstudenteducationper$100,andamongthesefourstudies,theaverageeffectsizesofspend-

ing$100are20.7years,13.9years,0.71yearsand0.27yearsrespectively.(Notethatthisisa

different—andsuperior—standardizationfromtheeffectsizestandardizationdiscussedabove.)

Whatcanweconcludefromsuchcomparisons?Foraphilanthropicdonorinterestedin

education,andifmarginalandaverageeffectsarethesame,theymightindicatethatthebest

placetodevoteamarginaldollarisinMadagascar,whereitwouldbeusedtoinformparents

aboutthevalueofeducation.Thisiscertainlyuseful,butitisnotasusefulasstatementsthat

informationordewormingprogramsareeverywheremorecost-effectivethanprogramsinvolv-

ingschooluniformsorscholarships,orifnoteverywhere,atleastoversomedomain,anditis

thesesecondkindsofcomparisonthatwouldgenuinelyfulfillthepromiseof“findingoutwhat

works.”Butsuchcomparisonsonlymakesenseifwecantransporttheresultsfromoneplaceto

another,iftheKenyanresultsalsoholdinMadagascar,Mali,orNamibia,orsomeotherlistof

Africanornon-Africanplaces.J-PAL’smanualforcost-effectiveness,Dhaliwaletal(2012)ex-

plainsin(entirelyappropriate)detailhowtohandlevariationincostsacrosssites,notingvaria-

blefactorssuchaspopulationdensity,prices,exchangerates,discountrates,inflation,andbulk

discounts.Butitgivesshortshrifttocross-sitevariationinthesizeofaveragetreatmenteffects

whichplayanequalpartinthecalculationsofcosteffectiveness.Themanualbrieflynotesthat

diminishingreturns(orthe“last-mile”problem)mightbeimportantintheory,butarguesthat

thebaselinelevelsofoutcomesarelikelytobesimilarinthepilotandreplicationareas,sothat

theaveragetreatmenteffectcanbesafelytransportedasis.Allofthislacksajustificationfor

transportability,someunderstandingofwhenresultstransport,whentheydonot,orbetter

still,howtheyshouldbemodifiedtomakethemtransportable.

OneofthelargestandmosttechnicallyimpressiveofthedevelopmentRCTsisby

Banerjeeetal(2015),whichtestsa“graduation”programdesignedtopermanentlyliftextreme-

lypoorpeoplefrompovertybyprovidingthemwithagiftofaproductiveasset(fromguinea-

pigs,(regular-)pigs,sheep,goats,orchickensdependingonlocale),trainingandsupport,life

skillscoaching,aswellassupportforconsumption,saving,andhealthservices;theideaisthat

Page 56: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

55

thispackageofaidcanhelppeoplebreakoutofpovertytrapsinawaythatwouldnotbepossi-

blewithoneinterventionatatime.ComparableversionsoftheprogramweretestedinEthio-

pia,Ghana,Honduras,India,Pakistan,andPeruand,exceptingHonduras(wherethechickens

died)findlargelypositiveandpersistenteffects—withsimilar(standardized)effectsizes—fora

rangeofoutcomes(economic,mentalandphysicalhealth,andfemaleempowerment).Onesite

apart,essentiallyeveryoneacceptedtheirassignment,sothatmanyofthefamiliarcaveatsdo

notapply.ReplicationofpositiveATEsoversuchawiderangeofplacescertainlyprovidesproof

ofconceptforsuchascheme.YetBauchet,Morduch,andRavi(2015)failtoreplicatetheresult

inSouthIndia,wherethecontrolgroupgotaccesstomuchthesamebenefits,whatHeckman,

Hohman,andSmith(2000)call‘substitutionbias’.Evenso,theresultsareimportantbecause,

althoughthereisalongstandinginterestinpovertytraps,manyeconomistshavelongbeen

skepticaloftheirexistenceorthattheycouldbesprungbysuchaid-basedpolicies.Inthissense,

thestudyisanimportantcontributiontothetheoryofeconomicdevelopment;ittestsatheo-

reticalpropositionandwill(orshould)changemindsaboutit.

Anumberofdifficultiesremain.Astheauthorsnote,suchtrialscannottelluswhich

componentofthetreatmentaccountedfortheresults,orwhichmightbedispensable—amuch

moreexpensivemultifactorialtrialwouldberequired—thoughitseemslikelyinpracticethat

thecostliestcomponent—therepeatedvisitsfortrainingandsupport—islikelytobethefirstto

becutbycash-strappedpoliticiansoradministrators.Andasnoted,itisunclearwhatshould

countas(simple)replicationininternationalcomparisons;itishardtothinkoftheusesof

standardizedeffectsizes,excepttodocumentthateffectsexisteverywhereandthattheyare

similarlylargerelativetolocalvariationinsuchthings.

Theeffectsize—theaveragetreatmenteffectexpressedinnumbersofstandarddevia-

tionsoftheoriginaloutcome—thoughconvenientlydimensionless,haslittletorecommendit.

AswithmuchofRCTpractice,itstripsoutanyeconomiccontent—noratesofreturn,orbenefits

minuscosts—anditremovesanydisciplineonwhatisbeingcompared.Applesandorangesbe-

comeimmediatelycomparable,asdotreatmentswhoseinclusioninameta-analysisislimited

onlybytheimaginationoftheanalystsinclaimingsimilarity.Inpsychology,wheretheconcept

originated,thereareendlessdisputesaboutwhatshouldandshouldnotbepooledinameta-

analysis.Beyondthat,asarguedbySimpson(2016),restrictionsonthetrialsample—oftengood

practicetoreducebackgroundnoiseandtohelpdetectaneffect—willreducethebaseline

standarddeviationandinflatetheeffectsize.Moregenerally,effectsizesareopentomanipula-

Page 57: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

56

tionbyexclusionrules.Itmakesnosensetoclaimreplicabilityonthebasisofeffectsizes,let

alonetousethemtorankprojects.

Thegraduationstudycanbetakenastheclosesttofulfillingthe“findingoutwhat

works”aimoftheRCTmovementindevelopment.Yetitissilentonperhapsthecrucialaspect

forpolicy,whichisthatthetrialwasrunentirelyinpartnershipwithNGOs,whereaswhatwe

wouldliketoknowiswhetheritcouldbereplicatedbygovernments,includingthosegovern-

mentsthatareincapableofgettingdoctors,nurses,andteacherstoshowuptoclinics,or

schools,Chaudhuryetal(2005),Banerjee,DeatonandDuflo(2004),orofregulatingthequality

ofmedicalcareineitherthepublicorprivatesectors,Filmer,HammerandPritchett(2000)or

DasandHammer(2005).Infact,wealreadyknowagreatdealabout“whatworks.”Vaccina-

tionswork,maternalandchildhealthcareserviceswork,andclassroomteachingworks.Yet

knowingthisdoesnotgetthosethingsdone.Addinganotherprogramthatworksunderideal

conditionsisusefulonlywheresuchconditionsexist,andthatwouldlikelybeunnecessarywhen

theyexist.Findingoutwhatworksisnotthemagickeytoeconomicdevelopment.Technical

knowledge,thoughalwaysworthhaving,requiressuitableinstitutionsifitistodoanygood.

Asimilarpointisdocumentedinthecontrastbetweenasuccessfultrialthatusedcam-

erasandthreatsofwagereductionstoincentivizeattendanceofteachersinschoolsrunbyan

NGOinRajasthaninIndia,Duflo,Hanna,andRyan(2012),andthesubsequentfailureofafol-

low-upprograminthesamestatetotacklemassabsenteeismofhealthworkers,Banerjee,

Duflo,andGlennerster(2008).Intheschools,thecamerasandtimekeepingworkedasintended,

andteacherattendanceincreased.Intheclinics,therewasashort-runeffectonnurseattend-

ance,butitwasquicklyeliminated.(Theabilityofagentseventuallytounderminepoliciesthat

areinitiallyeffectiveiscommonenoughandnoteasilyhandledwithinanRCT.)Inbothtrials,

therewereincentivestoimproveattendance,andtherewereincentivestofindawaytosabo-

tagethemonitoringandrestoreworkerstotheiraccustomedpositions;theforceofthesein-

centivesisa“high-level”cause,likegravity,ortheprincipleofthelever,thatworksinmuchthe

samewayeverywhere.Fortheclinics,somesabotagewasdirect—thesmashingofcameras—

andsomewassubtler,whengovernmentsupervisorsprovidedofficial,thoughessentiallyspe-

ciousreasons,formissingwork.Wecanonlyconjecturewhythecausalitywasswitchedinthe

movefromNGOtogovernment;wesuspectthatworkingforahighly-respectedlocalNGOisa

differentcontractfromworkingforthegovernment,wherenotshowingupforworkiswidely(if

informally)understoodtobepartofthedeal.Theincentiveleverworkswhenitiswiredup

Page 58: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

57

right,aswiththeNGOs,butnotwhenthewiringcutsitout,aswiththegovernment.Knowing

“whatworks”inthesenseofthetreatmenteffectonthetrialpopulationisoflimitedvalue

withoutunderstandingthepoliticalandinstitutionalenvironmentinwhichitisset.Thisunder-

linestheneedtounderstandtheunderlyingsocial,economic,andculturalstructures—including

theincentivesandagencyproblemsthatinhibitservicedelivery—thatarerequiredtosupport

thecausalpathwaysthatweshouldliketoseeatwork.

Trialsineconomicdevelopmentaresusceptibletothecritiquethattheytakeplaceinar-

tificialenvironments.Drèze(2016)notes,basedonextensiveexperienceinIndia,“whenafor-

eignagencycomesinwithitsheavybootsandsuitcasesofdollarstoadministera`treatment,’

whetherthroughalocalNGOorgovernmentorwhatever,thereisalotgoingonotherthanthe

treatment.”Thereisalsothesuspicionthatatreatmentthatworksdoessobecauseofthepres-

enceofthe“treators,”oftenfromabroad,ratherthanbecauseofthepeoplewhowillbecalled

toworkitinreality.

ThereisalsomuchtobelearnedfrommanyyearsofeconomictrialsintheUnited

States,particularlyfromtheworkoftheManpowerDemonstrationResearchCorporation(now

knownbyitsinitialsMDRC),fromtheearlyincometaxtrials,aswellasfromtheRandHealth

Experiment.Followingtheincometaxtrials,MDRChasrunmanyrandomizedtrialssincethe

1970s,mostlyfortheFederalgovernmentbutalsoforindividualstatesandforCanada,seethe

thoroughandinformativeaccountbyGueronandRolston(2011)forthefactualinformation

underlyingthefollowingdiscussion.MRDC’sprogram,likethatofJPALindevelopment,isin-

tendedtofindout“whatworks”inthestateandfederalwelfareprograms.Theseprogramsare

conditionalcashtransfersinwhichpoorrecipientsaregivencashprovidedtheysatisfycertain

conditionswhichareoftenthesubjectofthetrial.Shouldtherebeworkrequirements?Should

thereberemedialeducationalbeforeworkrequirements?Whatarethebenefitsandcostsof

variousalternatives,bothtotherecipientsandtothelocalandfederaltaxpayers?Allofthese

programsaredeeplypoliticized,withsharplydifferentviewsoverbothfactsanddesirability.

Manyengagedinthesedisputesfeelcertainofwhatshouldbedoneandwhatitsconsequences

willbesothat,bytheirlights,controlgroupsareunethicalbecausetheydeprivesomepeopleof

whattheadvocates“know”willbecertainbenefits.Giventhis,itisperhapssurprisingthatRCTs

havebecometheacceptednormforthiskindofpolicyevaluationintheUS.

Thereasonsowemuchtopoliticalinstitutions,aswellastothecommonfaiththatRCTs

canrevealthetruth.AttheFederallevel,prospectivepoliciesarevettedbythenon-partisan

Page 59: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

58

CongressionalBudgetOffice,whichmakesitsownestimatesofthebudgetaryimplicationsof

theprogram.IdeologueswhoseprogramsscorepoorlybytheCBOhaveanincentivetosupport

anRCT,nottoconvincethemselves,buttoconvincetheiropponents;onceagain,RCTsarees-

peciallyvaluablewhenyouropponentsdonotshareyourprior.Andcontrolgroupsareeasierto

putinplacewhenthereareinsufficientfundstocoverthewholepopulation.Therewasalsoa

widespreadandlargelyuncriticalbeliefthatRCTsalwaysgivetherightanswer,atleastforthe

budgetaryimplications,which,ratherthanthewellbeingoftherecipients,wereoftenthepri-

mary(andindeedsometimestheonly)concern;notethatallofthesetrialsareonpoorpeople

byrichpeoplewhoaretypicallymoreconcernedwithcostthanwiththewellbeingofthepoor,

Greenberg,SchroderandOnstott(1999).MDRCstrialscouldthereforebeeffectivedisputerec-

onciliationmechanismsbothforthosewhosawtheneedforevidenceandforthosewhodid

not(exceptinstrumentally).Notethattheoutcomeherefitswithour“publichealth”case;what

thepoliticiansneedtoknowisnottheoutcomesforindividuals,orevenhowtheoutcomesin

onestatemighttransporttoanother,buttheaveragebudgetarycostinaspecificplaceforeach

poorpersontreated,somethingthatagoodRCTconductedonarepresentativesampleofthe

targetpopulationisequippedtodeliver,atleastintheabsenceofgeneralequilibriumeffects,

timingeffects,etc.

TheseRCTsbyMDRCandothercontractorsdeservemuchcredit.Theyhavedemon-

stratedboththefeasibilityoflarge-scalesocialtrialsincludingthepossibilityofrandomizationin

thesesettings(wheremanyparticipantswerehostiletotheidea),aswellastheirusefulnessto

policymakers.Theyalsoseemtohavechangedbeliefs,forexampleinfavorofthedesirabilityof

workrequirementsasaconditionofwelfare,evenamongmanyofthosewhowereoriginally

opposed.Therearealsolimitations;thetrialsappeartohavehadatbestalimitedinfluenceon

scientificthinkingaboutbehaviorinlabormarkets.Theresultsofsimilarprogramshaveoften

beendifferentacrossdifferentsites,andtherehastodatebeennofirmunderstandingofwhy;

indeed,thetrialsarenotdesignedtorevealthis,Moffitt(2004).Finally,andperhapscruciallyfor

thepotentialcontributiontoeconomicscience,therehasbeenlittlesuccessinunderstanding

eithertheunderlyingstructuresorchainsofcausation,inspiteofadeterminedeffortfromthe

verybeginningtopeerintotheblackboxes.Withoutsuchmechanisms,transportabilityisal-

waysindoubt,itisimpossibleforpolicymakersoracademicstopurposivelyimprovethepoli-

cies,andthecontributionstocumulativescienceareseverelylimited.

Page 60: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

59

TheRANDhealthexperiment,Manningetal(1975a,b),providesadifferentbutequally

instructivestoryifonlybecauseitsresultshavepermeatedtheacademicandpolicydiscussions

abouthealthcareeversince.Itwasoriginallydesignedtotestthequestionofwhethermore

generousinsurancewouldcausepeopletousemoremedicalcareand,ifso,byhowmuch.The

incentiveeffectsarehardlyindoubttoday;theimmortalityofthestudycomesratherfromthe

factthatitsmulti-arm(responsesurface)designallowedthecalculationofanelasticityforthe

studypopulation,thatmedicalexpendituresdecreasedby–0.1to–0.2percentforeveryper-

centageincreaseinthecopayment.AccordingtoAron-Dine,Einav,andFinkelstein(2013),itis

thisdimensionlessandthusapparentlytransportablenumberthathasbeenusedeversinceto

discussthedesignofhealthcarepolicy;theelasticityhascometobetreatedasauniversalcon-

stant.Ironically,theyarguethattheestimatecannotbereplicatedinrecentstudies,anditis

evenunclearthatitisfirmlybasedontheoriginalevidence.Thisaccountpoints,onceagain,to

thecentralimportanceoftransportabilityfortheusefulnessandlong-termusefulnessofatrial.

Here,thesimpledirecttransportabilityoftheresultseemstohavebeenlargelyillusorythough,

aswehaveargued,thisdoesnotmeanthatmorecomplexconstructionsbasedontheresultsof

thetrialwouldnothavedonebetter.

Conclusions

RCTsaretheultimateincredibleestimationofaveragetreatmenteffectsinthepopulationbe-

ingstudiedbecausetheymakesofewassumptionsaboutheterogeneity,causalstructure,

choiceofvariables,andfunctionalform.Theyaretrulynonparametric.Andindeed,thisissome-

timesjustwhatwewant,particularlywherewehavelittlecrediblepriorinformation.RCTsare

oftenconvenientwaystointroduceexperimenter-controlledvariance—ifyouwanttoseewhat

happens,thenkickitandsee,twistthelion’stail—butnotethatmanyexperiments,including

manyofthemostimportant(andNobelPrizewinning)experimentsineconomics,donotand

didnotuserandomization,Harrison(2013),Svorencik(2015).Butthecredibilityoftheresults,

eveninternally,canbeunderminedbyexcessiveheterogeneityinresponses,andespecially

whenthedistributionofeffectsisasymmetric,whereinferenceonmeanscanbehazardous.

Ironically,thepriceofthecredibilityinRCTsisthatallwegetaremeans.Yet,inthepresenceof

outliers,meansthemselvesdonotprovidethebasisforreliableinference.Andrandomizationin

andofitselfdoesnothingunlessthedetailsareright;purposiveselectionintotheexperimental

population,likepurposiveselectionintoandoutofassignment,underminesinferenceinjust

Page 61: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

60

thesamewayasdoesselectioninobservationalstudies.Lackofblinding,whetherofpartici-

pants,trialists,datacollectors,oranalysts,underminesinferencebypermittingfactorsother

thanthetreatmenttoaffecttheoutcome,akintoafailureofexclusionrestrictionsininstru-

mentalvariableanalysis.

ThelackofstructurecanbecomeseriouslydisablingwhenwetrytouseRCTresults,

outsideofafewcontexts,suchasprogramevaluation,hypothesistesting,orestablishingproof

ofconcept.Beyondthat,weareintrouble.Wecannotusetheresultstohelpmakepredictions

elsewherewithoutmorestructure,withoutmorepriorinformation,andwithouthavingsome

ideaofwhatmakestreatmenteffectsvaryfromplacetoplace,ortimetotime.Thereisnoop-

tionbuttocommittosomecausalstructureifwearetoknowhowtouseRCTevidenceelse-

where,ortousetheestimatesoutoftheoriginalcontext.Simplegeneralizationandsimpleex-

trapolationjustdonotcutthemustard.Thisistrueofanystudy,experimentalorobservational.

Butobservationalstudiesarefamiliarwith,androutinelyworkwith,thesortofassumptions

thatRCTsclaimtoavoid,sothatiftheaimistouseempiricalevidence,anycredibilityadvantage

thatRCTshaveinestimationisnolongeroperative.

Yetoncethatcommitmenthasbeenmade,RCTevidencecanbeextremelyuseful,pin-

ningdownpartofastructure,helpingtobuildstrongerunderstandingandknowledge,andhelp-

ingtoassesswelfareconsequences.Asourexamplesshow,thiscanoftenbedonewithout

committingtothefullcomplexityofwhatareoftenthoughtofasstructuralmodels.Yetwithout

thestructurethatallowsustoplaceRCTresultsincontext,ortounderstandthemechanisms

behindthoseresults,notonlycanwenottransportwhether“itworks”elsewhere,butwecan-

notdothestandardstuffofeconomics,whichistosaywhetherornottheinterventionisactual-

lywelfareimproving,seeHarrison(2014)foravividaccountthatsharplyidentifiesthisandoth-

erissues.Withoutknowingwhythingshappenandwhypeopledothings,weruntheriskof

worthlesscasual(“fairystory”)causaltheorizingandhaveessentiallygivenupononeofthe

centraltasksofeconomics.

Wemustbackawayfromtherefusaltotheorize,fromtheexultationinourabilityto

handleunlimitedheterogeneity,andactuallySAYsomething.Perhapsparadoxically,unlesswe

arepreparedtomakeassumptions,andtosaywhatweknow,makingstatementsthatwillbe

incredibletosome,allthecredibilityoftheRCTisfornaught.

Inthespecificcontextofdevelopmentthathasconcernedushere,RCTshaveproven

theirworthinprovidingproofsofconceptandattestingpredictionsthatsomepoliciesmust

Page 62: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

61

alwaysworkorcanneverwork.But,aselsewhereineconomics,wecannotfindoutwhysome-

thingworksbysimplydemonstratingthatitdoeswork,nomatterhowoften,whichleavesus

uninformedastowhetherthepolicyshouldbeimplemented.Beyondthat,smallscale,demon-

strationRCTsarenotcapableoftellinguswhatwouldhappenifthesepolicieswereimplement-

edtoscale,ofcapturingunintendedconsequencesthattypicallycannotbeincludedinthepro-

tocols,orofmodelingwhatwillhappenifschemesareimplementedbygovernments,whose

motivesandoperatingprinciplesaredifferentfromtheNGOswhotypicallyruntrials.Whileitis

truethatabstractknowledgeisalwayslikelytobebeneficialtoeconomicdevelopment,success-

fuldevelopmentdependsoninstitutionsandonpolitics,mattersonwhichRCTshavelittleto

say.Intheend,RCTsareoneofthemanyexternaltechnicalfixesthathavemeanderedoffand

onthedevelopmentstagesincetheSecondWorldWar,includingbuildinginfrastructure,getting

pricesright,andservicedelivery,noneofwhichhavefaceduptotheessentialdomesticpolitical

foundationsfordevelopment.

Citations

Ahuja,Amrita,SarahBaird,JoanHamoryHicks,MichaelKremer,EdwardMiguel,andShawnPowers,2015,“Whenshouldgovernmentssubsidizehealth?Thecaseofmassdeworming,”WorldBankEconomicReview,29,S9–S24.

Aigner,DennisJ.,1985,“Theresidentialelectricitytime-of-usepricingexperiments.Whathavewelearned?”inDavidA.WiseandJerryA.Hausman,Socialexperimentation,Chicago,Il.Chi-cagoUniversityPressforNationalBureauofEconomicResearch,11–54.

Aiken,AlexanderM.,CalumDavey,JamesR.HargreavesandRichardJ.Hayes,“Re-analysisofhealthandeducationalimpactsofaschool-baseddewormingprogrammeinwesternKenya:apurereplication,”InternationalJournalofEpidemiology,0(0),1–9.

Al-Ubaydil,Omar,andJohnA.List,2013,“Onthegeneralizabilityofexperimentalresultsineco-nomics,”inG.FrechetteandA.Schotter,Methodsofmodernexperimentaleconomics,Ox-fordUniversityPress.

Altman,DouglasG.,1985,“Comparabilityofrandomizedgroups,”JournaloftheRoyalStatisticalSociety,SeriesD(TheStatistician),34(1),Statisticsinhealth,125–36.

Angrist,JoshuaD.,2004,“Treatmenteffectheterogeneityintheoryandpractice,”EconomicJournal,114,C52–C83.

Angrist,JoshuaD.,EricBettinger,ErikBloom,ElizabethKingandMichaelKremer,2002,“Vouch-ersforprivateschoolinginColombia:evidencefromarandomizednaturalexperiment,”AmericanEconomicReview,92(5),1535–58.

Angrist,JoshuaD.,andJörn-SteffenPischke,2010,“Thecredibilityrevolutioninempiricaleco-nomics:howbetterresearchdesignistakingtheconoutofeconometrics,”JournalofEco-nomicPerspectives,24(2),3–30.

Aron-Dine,Aviva,LiranEinav,andAmyFinkelstein,2013,“TheRANDhealthinsuranceexperi-ment,threedecadeslater,”JournalofEconomicPerspectives,27(1),197–222.

Page 63: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

62

Arrow,KennethJ.,1975,“Twonotesoninferringlongrunbehaviorfromsocialexperiments,”DocumentNo.P-5546,SantaMonica,CA.RandCorporation.

Ashenfelter,Orley,1978,“Estimatingtheeffectoftrainingprogramsonearnings,”ReviewofEconomicsandStatistics,60(1),47–57.

Ashenfelter,Orley,1978,“Thelaborsupplyresponseofwageearners,”inJohnL.PalmerandJosephA.Pechman,eds.,Welfareinruralareas:theNorthCarolina–IowaIncomeMainte-nanceExperiment,Washington,DC.TheBrookingsInstitution.109–38.

Attanasio,Orazio,CostasMeghir,andAnaSantiago,2012,“EducationchoicesinMexico:usingastructuralmodelandarandomizedexperimenttoevaluatePROGRESA,”ReviewofEconomicStudies,79(1),37–66.

Attanasio,Orazio,SarahCattan,EmlaFitzsimons,CostasMeghir,andMartaRubioCodina,2015,“Estimatingtheproductionfunctionforhumancapital:resultsfromarandomizedcontrolledtrialinColumbia,”London.InstituteforFiscalStudies,WorkingPapernoW15/06.

Bahadur,R.R.,andLeonardJ.Savage,1956,“Thenon-existenceofcertainstatisticalproceduresinnonparametricproblems,”AnnalsofMathematicalStatistics,25:1115–22.

Banerjee,Abhijit,SylvainChassang,SergioMontero,andErikSnowberg,2016,“Atheoryofex-perimenters,”processed,July2016.

Banerjee,Abhijit,SylvainChassang,andErikSnowberg,2016,“Decisiontheoreticapproachestoexperimentdesignandexternalvalidity,”Cambridge,MA.NBERWorkingPaperno22167,April.

Banerjee,Abhijit,AngusDeaton,andEstherDuflo,2004,“HealthcaredeliveryinruralRaja-sthan,”EconomicandPoliticalWeekly,39(9),944–9.

Banerjee,Abhijit,andEstherDuflo,2012,Pooreconomics:aradicalrethinkingofthewaytofightglobalpoverty,PublicAffairs.

Banerjee,Abhijit,EstherDuflo,NathanaelGoldberg,DeanKarlan,RobertOsei,WilliamParienté,JeremyShapiro,BramThuysbaert,andChristopherUdry,2015,“Amultifacetedprogramcauseslastingprogressfortheverypoor:evidencefromsixcountries,”Science,348(6236),1260799.

Banerjee,Abhijit,EstherDuflo,andRachelGlennerster,2008,“Puttingaband-aidonacorpse:incentivesfornursesintheIndianpublichealthcaresystem,”JournaloftheEuropeanEco-nomicAssociation,6(2–3),487–500.

Banerjee,AbhijitV.,andRuiminHe,2003,“TheWorldBankofthefuture,”AmericanEconomicReview,93(2),39–44.

Bauchet,Jonathan,JonathanMorduchandShamikaRavi,2015,“Failurevsdisplacement:whyaninnovativeanti-povertyprogramshowednonetimpactinSouthIndia,”JournalofDevel-opmentEconomics,116,1–16.

Basu,Kaushik,2010,“TheeconomicsoffoodgrainmanagementinIndia,”MinistryofFinance,Delhi.http://finmin.nic.in/workingpaper/Foodgrain.pdf

Bloom,HowardS.,CarolynJ.Hill,andJamesA.Riccio,2005,“Modelingcross-siteexperimentaldifferencestofindoutwhyprogrameffectivenessvaries,”inHowardS.Bloom,ed.,Learningmorefromsocialexperiments:evolvinganalyticalapproaches,NewYork,NY.RussellSage.

Bobonis,Gustavo,EdwardMiguel,andCharuPuri-Sharma,2006,“Anemiaandschoolparticipa-tion,”JournalofHumanResources,41(4),692–721.

Bold,Tessa,MwangiKimenyi,,GermanoMwabu,AliceNg’ang’aandJustinSandefur,2013,“Scalingupwhatworks:experimentalevidenceonexternalvalidityinKenyaneducation,”Washington,DC.CenterforGlobalDevelopment,WorkingPaper321.

Bothwell,LauraE.,andScottH.Podolsky,2016,“Theemergenceoftherandomized,controlledtrial,”NewEnglandJournalofMedicine,375(6),501–4.doi:10.1056/NEJMp1604635

Page 64: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

63

Campbell,D.T.,andJ.C.Stanley,1963,Experimentalandquasi-experimentaldesignsforre-search.Chicago.RandMcNally.

Cartwright,Nancy,1994,Nature’scapacitiesandtheirmeasurement.Oxford.ClarendonPress.Cartwright,Nancy,andJeremyHardie,2012,Evidencebasedpolicy:apracticalguidetodoingit

better,Oxford.OxfordUniversityPress.Chalmers,Iain,2001,“Comparinglikewithlike:somehistoricalmilestonesintheevolutionof

methodstocreateunbiasedcomparisongroupsintherapeuticexperiments,”InternationalJournalofEpidemiology,30,1156–64.

Chalmers,Iain,2003,“FisherandBradfordHill:theoryandpragmatism?”InternationalJournalofEpidemiology,32,922–24.

Chassang,Sylvain,GerardPadróIMiguel,andErikSnowberg,2012,“Selectivetrials:aprincipal–agentapproachtorandomizedcontrolledexperiments,”AmericanEconomicReview,102(4),1279–1309.

Chassang,Sylvain,ErikSnowberg,BenSeymour,andCayleyBowles,2015,“Accountingforbe-haviorintreatmenteffects:newapplicationsforblindtrials,”PLoSOne,10(6),e0127227.doi:10:1371/journal.pone.0127227.

Chaudhury,Nazmul,JeffreyHammer,MichaelKremer,KarthikMuralidharanandF.HalseyRog-ers,2005,“Missinginaction:teacherandhealthworkerabsenceindevelopingcountries,”JournalofEconomicPerspectives,19(4),91–116.Chyn,Eric,2016,“Movedtoopportunity:thelong-runeffectofpublichousingdemolitiononlabormarketoutcomesofchildren,”Uni-versityofMichigan.http://www-personal.umich.edu/~ericchyn/Chyn_Moved_to_Opportunity.pdf

Conlisk,John,1973,“Choiceofresponsefunctionalformindesigningsubsidyexperiments,”Econometrica,41(4),643–56.

Crépon,Bruno,EstherDuflo,MarcGurgand,RolandRathelot,andPhilippeZamora,2014,“Dolabormarketpolicieshavedisplacementeffects?evidencefromaclusteredrandomizedex-periment,”QuarterlyJournalofEconomics,128(2),531–80.

Croke,Kevin,JoanHamoryHicks,EricHsu,MichaelKremer,andEdwardMiguel,2016,“Doesmassdewormingaffectchildren’snutrition?Metaanalysis,costeffectiveness,andstatisticalpower,”Cambridge,MA.NBERWorkingPaperNo.22382(July.)

Cronbach,LeeJ.,S.R.Ambron,S.M.Dornbusch,R.D.Hess,R.C.Hornick,D.C.Phillips,D.F.Walker,andS.S.Weiner,1980,Towardsreformofprogramevaluation,SanFrancisco,Jossey-Bass.

Das,JishnuandJeffreyHammer,2005,”’Whichdoctor?Combiningvignettesanditemresponsetomeasureclinicalcompetence,”JournalofDevelopmentEconomics,78,348–83.

Davey,Calum,AlexanderM.Aitken,RichardJ.Hayes,andJamesR.Hargreaves,2015,“Re-analysisofhealthandeducationalimpactsofaschool-baseddewormingprogrammeinwesternKenya:astatisticalreplicationofaclusterquasi-randomizedsteppedwedgetrial,”InternationalJournalofEpidemiology,0(0),1–12.

Deaton,Angus,andJohnMuellbauer,1980,Economicsandconsumerbehavior,NewYork.Cam-bridgeUniversityPress.

Dhaliwal,Iqbal,EstherDuflo,RachelGlennerster,andCaitlinTulloch,2012,“Comparativecost-effectivenessanalysistoinformpolicyindevelopingcountries:ageneralframeworkwithap-plicationsforeducation,”J–PAL,MIT,December3rd.http://www.povertyactionlab.org/publication/cost-effectiveness

Drèze,Jean,2016,Personalemailcommunication.Duflo,Esther,RemaHanna,andStephenP.Ryan,2012,“Incentiveswork:gettingteachersto

cometoschool,”AmericanEconomicReview,102(4),1241–78.

Page 65: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

64

Duflo,Esther,andMichaelKremer,2008,“Useofrandomizationintheevaluationofdevelop-menteffectiveness,”inWilliamEasterly,ed.,Reinventingforeignaid.Washington,DC.Brook-ings,93–120.

Dynarski,Susan,2015,”Helpingthepoorineducation:thepowerofasimplenudge,”NewYorkTimes,Jan17,2015.

Fine,PaulE.M.,andJacquelineA.Clarkson,1986,“Individualversuspublicprioritiesinthede-terminationofoptimalvaccinationpolicies,”AmericanJournalofEpidemiology,124(6),1012–20.

Fisher,RonaldA.,1926,“Thearrangementoffieldexperiments,”JournaloftheMinistryofAgri-cultureofGreatBritain,33,503–13.

Filmer,Deon,JeffreyHammer,andLantPritchett,2000,“Weaklinksinthechain:adiagnosisofhealthpolicyinpoorcountries,”WorldBankResearchObserver,15(2),199–204.

Freedman,DavidA.,2006,“Statisticalmodelsforcausation:whatinferentialleveragedotheyprovide?”EvaluationReview,30:691−713.

Freedman,DavidA.,2008,“Onregressionadjustmentstoexperimentaldata,”AdvancesinAp-pliedMathematics,40,180–93.

Garfinkel,Irwin,andCharlesF.Manski,1992,“Introduction,”inIrwinGarfinkelandCharlesF.Manski,eds.,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress.1–22.

Gertler,PaulJ.,SebastianMartinez,PatrickPremand,LauraB.Rawlings,andChristelM.J.Ver-meersch,Impactevaluationinpractice,Washington,DC.TheWorldBank.

Glewwe,Paul,MichaelKremer,SylvieMoulin,andEricZitzewitz,2004,“Retrospectivevs.pro-spectiveanalysesofschoolinputs:thecaseofflip-chartsinKenya,”JournalofDevelopmentEconomics,74,251–68.

Greenberg,DavidandMarkShroder,2004,Thedigestofsocialexperiments(3rded.),Washing-ton,DC.UrbanInstitutePress.

Greenberg,David,MarkShroder,andMatthewOnstott,1999,“Thesocialexperimentmarket,”JournalofEconomicPerspectives,13(3),157–72.

Gueron,JudithM.,andHowardRolston,2013,Fightingforreliableevidence,NewYork,RussellSage.

Guyatt,Gordon,DavidL.SackettandDeborahJ.CookfortheEvidence-BasedMedicineWorkingGroup,1994,“Users’guidestothemedicalliteratureII:howtouseanarticleabouttherapyorprevention.B.Whatweretheresultsandwilltheyhelpmeincaringformypatients?”JournaloftheAmericanMedicalAssociation,271(1),59–63.

Hargreaves,JamesR.,AlexanderM.Aiken,CalumDavey,andRichardJ.Hayes,2015,“Authors’responseto:dewormingexternalitiesandschoolimpactsinKenya,”InternationalJournalofEpidemiology,0(0),1–3.

Harrison,GlennW.,2013,“Fieldexperimentsandmethodologicalintolerance,”JournalofEco-nomicMethodology,20(2),103–17.

Harrison,GlennW.,2014,“Impactevaluationandwelfareevaluation,”EuropeanJournalofDe-velopmentResearch,26,39–45.

Hausman,JerryA.,andDavidA.Wise,1985,“Technicalproblemsinsocialexperimentation:costversuseaseofanalysis,”inJerryA.HausmanandDavidA.Wise,eds.,SocialExperimentation,Chicago,IL.ChicagoUniversityPress.187–220.

Heckman,JamesJ.,1992,“Randomizationandsocialpolicyevaluation,”inCharlesF.ManskiandIrwinGarfinkel,eds.,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress.547–70.

Page 66: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

65

Heckman,JamesJ.,1997,“Instrumentalvariables:astudyofimplicitbehavioralassumptionsusedinmakingprogramevaluations,”JournalofHumanResources,32(3),441–62.

Heckman,JamesJ.,NeilHohman,andJeffreySmith,withtheassistanceofMichaelKhoo,2000,“Substitutionanddropoutbiasinsocialexperiments:astudyofaninfluentialsocialexperi-ment,”QuarterlyJournalofEconomics,115(2),651–94.

Heckman,JamesJ.,RobertJ.Lalonde,andJeffreyA.Smith,1999,“Theeconomicsandecono-metricsofactivelabormarkets,”Chapter31inAshenfelter,OrleyandDavidCard,eds.Handbookoflaboreconomics,Amsterdam.North-Holland,3(A),1866–2097.

Heckman,JamesJ,,RodrigoPinto,andPeterSavelyev,2013,“Understandingthemechanismsthroughwhichaninfluentialearlychildhoodprogramboostedadultoutcomes,”AmericanEconomicReview,103(6),2052–86.

Heckman,JamesJ.,JeffreySmith,andNancyClements,1997,“Makingthemostoutofpro-grammeevaluationsandsocialexperiments:accountingforheterogeneityinprogrammeimpacts,”ReviewofEconomicStudies,64(4),487–535.

Heckman,JamesJ,andEdwardVytlacil,2005,“Structuralequations,treatmenteffects,andeconometricpolicyevaluation,”Econometrica,73(3),669–738.

Heckman,JamesJ.andEdwardJ.Vytlacil,2007,“Econometricevaluationofsocialprograms,Part1:causalmodels,structuralmodels,andeconometricpolicyevaluation,”Chapter70inJamesJ.HeckmanandEdwardE.Leamer,eds.,HandbookofEconometrics,6B,4779–874.

Hicks,JoanHamory,MichaelKremer,andEdwardMiguel,2015,“Commentary:dewormingex-ternalitiesandschoolingimpactsinKenya:acommentonAikenetal(2015)andDaveyetal.(2015),”InternationalJournalofEpidemiology,0(0),1–4.

Horton,Richard,2000,“Commonsenseandfigures:therhetoricofvalidityinmedicine:Brad-fordHillmemoriallecture1999,”Statisticsinmedicine,19,3149–64.

Hotz,V.Joseph,GuidoW.ImbensandJulieH.Mortimer,2005,“Predictingtheefficacyoffuturetrainingprogramsusingpastexperienceatotherlocations,”JournalofEconometrics,125,241–70.

Hsieh,Chang-taiandMiguelUrquiola,2006,“Theeffectsofgeneralizedschoolchoiceonachievementandstratification:evidencefromChile’svoucherprogram,”JournalofPublicEconomics,90,1477–1503.

Humphreys,Macartan,2015,“Whathasbeenlearnedfromthedewormingreplications:anon-partisanview,”ColumbiaUniversity,Aug.http://www.columbia.edu/~mh2245/w/worms.html

Imbens,GuidoW.,2004,“Nonparametricestimationofaveragetreatmenteffectsunderexoge-neity:areview,”ReviewofEconomicsandStatistics,86(1),4–29.

Imbens,GuidoW.,2010,“BetterLATEthannothing:somecommentsonDeaton(2009)andHeckmanandUrzua,”JournalofEconomicLiterature,48(2),399–423.

Imbens,GuidoW.andJoshuaD.Angrist,1994,“Identificationandestimationoflocalaveragetreatmenteffects,”Econometrica,62(2),467–75.

Imbens,GuidoW.,andJeffreyM.Wooldridge,2009,“Recentdevelopmentsintheeconometricsofprogramevaluation,”JournalofEconomicLiterature,47(1),5–86.

InternationalCommitteeofMedicalJournalEditors,2015,Recommendationsfortheconduct,reporting,editing,andpublicationofscholarlyworkinmedicaljournals,http://www.icmje.org/icmje-recommendations.pdf(accessed,August20,2016.)

Kahneman,DanielandGaryKlein,2009,“Conditionsforintuitiveexpertise:afailuretodisa-gree,”AmericanPsychologist,64(6),515–26.

Karlan,DeanandJacobAppel,2011,Morethangoodintentions:howaneweconomicsishelp-ingtosolveglobalpoverty,Dutton.

Page 67: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

66

Karlan,Dean,NathanealGoldbergandJamesCopestake,2009,“Randomizedcontrolledtrialsarethebestwaytomeasureimpactofmicrofinanceprogramsandimprovemicrofinanceproductdesigns,”EnterpriseDevelopmentandMicrofinance,20(3),167–76.

Kasy,Maximilian,2016,“Whyexperimentersmightnotwanttorandomize,andwhattheycoulddoinstead,”PoliticalAnalysis,1–15doi:10.1093/pan/mpw012

Kendall,MauriceG.,1959,“Hiawathadesignsanexperiment,”AmericanStatistician,13(5),23–4.

Kramer,Peter,2016,Ordinarilywell:thecaseforantidepressants,Farrar,Straus,andGiroux.Kremer,Michael,andAlakaHolla,2009,“Improvingeducationinthedevelopingworld:what

havewelearnedfromrandomizedevaluations?”AnnualReviewofEconomics,1,513–42. Lehman,Erich.L.,andJosephP.Romano,2005,Testingstatisticalhypotheses(thirdedition),

NewYork.Springer.Levy,Santiago,2006,Progressagainstpoverty:sustainingMexico’sProgresa-Oportunidades

program,Washington,DC.Brookings.Mackie,JohnL.,1974,Thecementoftheuniverse:astudyofcausation,Oxford.OxfordUniversi-

tyPress.Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeelerandArleenLeibowitz,

1988a,“Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedex-periment,”AmericanEconomicReview,77(3),251–77.

Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeeler,BernadetteBenjamin,ArleenLeibowitz,M.SusanMarquis,andJackZwanziger,1988b,Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedexperiment,SantaMonica,CA.RAND.

Manski,CharlesF.,1990,“Nonparametricboundsontreatmenteffects”AmericanEconomicReview,80(2),319–23.

Manski,CharlesF.,1995,Identificationproblemsinthesocialsciences,Cambridge,MA.HarvardUniversityPress.

Manski,CharlesF.,2003,Partialidentificationofprobabilitydistributions,NewYork.Springer.Manski,CharlesF.,2013,Publicpolicyinanuncertainworld:analysisanddecisions,Cambridge,

MA.HarvardUniversityPress.Metcalfe,CharlesE.,1973,“Makinginferencesfromcontrolledincomemaintenanceexperi-

ments,”AmericanEconomicReview,63(3),478–83.Miguel,Edward,andMichaelKremer,2004,“Worms:identifyingimpactsoneducationand

healthinthepresenceoftreatmentexternalities,”Econometrica,72(1),159–217.Miguel,Edward,MichaelKremer,andJoanHamoryHicks,2015,“CommentonMacartanHum-

phreys’andotherrecentdiscussionsoftheMiguelandKremer(2004)study,”Berkeley,Dec.http://emiguel.econ.berkeley.edu/assets/miguel_research/63/Worms-Comment_2015-12-21.pdf

Moffitt,Robert,1979,“ThelaborsupplyresponseintheGaryexperiment,”JournalofHumanResources,14(4),477–87.

Moffitt,Robert,1992,“Evaluationmethodsforprogramentryeffects,”Chapter6inCharlesManskiandIrwinGarfinkel,Evaluatingwelfareandtrainingprograms,Cambridge,MA.Har-vardUniversityPress,231–52.

Moffitt,Robert,2004,“Theroleofrandomizedfieldtrialsinsocialscienceresearch:aperspec-tivefromevaluationsofreformsofsocialwelfareprograms,”AmericanBehavioralScientist,47(5),506–40

Morgan,KariLock,andDonaldB.Rubin,2012,“Rerandomizationtoimprovecovariatebalanceinexperiments,”AnnalsofStatistics,40(2),1263–82.

Page 68: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

67

Muller,SeánM.,2015,“Causalinteractionandexternalvalidity:obstaclestothepolicyrele-vanceofrandomizedevaluations,”WorldBankEconomicReview,29,S217–S225.

Orcutt,GuyH.,andAliceG.Orcutt,1968,“Incentiveanddisincentiveexperimentationforin-comemaintenancepolicypurposes,”AmericanEconomicReview,58(4),754–72.

Pearl,Judea,2009,Causality:models,reasoning,andinference,2ndedition,Cambridge.Cam-bridgeUniversityPress.

Pettigrew,Mark,andIainChalmers,2011,“Useofresearchevidenceinpractice,”Lancet,378(9804),1696.

Rodrik,Dani,2006,personalemailcommunication.Rosenzweig,MarkandChristopherUdry,2016,“Externalvalidityinastochasticworld,”Cam-

bridge,MA.NBERWorkingPaper22449(July).Rothwell,PeterM.,2005,“Externalvalidityofrandomizedcontrolledtrials:‘towhomdothe

resultsofthetrialapply’”,Lancet,365,82–93.Russell,Bertrand,2008[1912],Theproblemsofphilosophy,Rockville,MD.ArcManor.Sackett,DavidL.,WilliamM.C.Rosenberg,J.A.MuirGray,R.BrianHaynesandW.ScottRich-

ardson,1996,“Evidencebasedmedicine:whatitisandwhatitisn’t,”BritishMedicalJournal,312(January13),71–2.

Scriven,Michael,1974,“Evaluationperspectivesandprocedures,”inW.JamesPopham,ed.,Evaluationineducation—currentapplications,Berkeley,CA.McCutchanPublishingCorpora-tion.

Sen,AmartyaK.,2011,Theideaofjustice,Cambridge,MA.HarvardUniversityPress.Senn,Stephen,1994,“Testingforbaselinebalanceinclinicaltrials,”StatisticsinMedicine,13,

1715–26.Senn,Stephen,2013,“Sevenmythsofrandomizationinclinicaltrials,”StatisticsinMedicine32,

1439–50.Shadish,WilliamR.,ThomasD.Cook,andDonaldT.Campbell,2002,Experimentalandquasi-

experimentaldesignsforgeneralizedcausalinference,Boston,MA.HoughtonMifflin.Simpson,Adrian,2016,“Comparingandcombiningstandardizedeffectsizes:themisdirectionof

publicpolicy,”WorkingPaper,UniversityofDurham(July).Singer,BurtonH.,andStevePincus,1998,“Irregulararraysandrandomization,”Proceedingsof

theNationalAcademyofSciencesoftheUSA,”95,1363–8.Stiles,CharlesWardell,1939,“Earlyhistory,inpartesoteric,ofthehookworm(uncinariasis)

campaigninoursouthernUnitedStates,”JournalofParasitology,25(4),283–308.Stuart,ElizabethA.,StephenR.Cole,andCatharineP.BradshawandPhilipJ.Leaf,2011,“The

useofpropensityscorestoassessthegeneralizabilityofresultsfromrandomizedtrials,”JournaloftheRoyalStatisticalSocietyA,174(2)369–86.

Svorencik,Andrej,2015,Theexperimentalturnineconomics:ahistoryofexperimentaleconom-ics,UtrechtSchoolofEconomics,DissertationSeries#29,http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2560026

Taylor-Robinson,DavidC.,NicolaMaayan,KarlaSoares-Weiser,SarahDonegan,andPaulGar-ner,2015,“Dewormingdrugsforsoil-transmittedintestinalwormsinchildren:effectsonnu-tritionalindicators,haemoglobin,andschoolperformance(review),”TheCochraneCollabo-ration.Wiley.http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD000371.pub6/abstract

Todd,PetraE.,andKennethJ.Wolpin,2006,“AssessingtheimpactofaschoolsubsidyprograminMexico:usingasocialexperimenttovalidateadynamicbehavioralmodelofchildschool-ingandfertility,”AmericanEconomicReview,96(5),1384–1417.

Page 69: Nancy Cartwright & Angus Deaton - Princeton Universitydeaton/downloads/Deaton_Cartwright_RCTs_wit… · Understanding and misunderstanding randomized controlled trials Angus Deaton

68

Todd,PetraE.,andKennethJ.Wolpin,2008,“Exanteevaluationofsocialprograms,”Annalesd’EconomieetdelaStatistique,91/92,263–91.

U.S.DepartmentofEducation,InstituteofEducationSciences,NationalCenterforEducationEvaluationandRegionalAssistance,2003,Identifyingandimplementingeducationalpractic-essupportedbyrigorousevidence:auserfriendlyguide,Washington,DC.InstituteofEduca-tionSciences.

Vandenbroucke,JanP.,2004,“Whenareobservationalstudiesascredibleasrandomizedcon-trolledtrials?”TheLancet,363:1728–31.

Vivalt,Eva,2015,“Howmuchcanwegeneralizefromimpactevaluations?”NYU,unpublished.http://evavivalt.com/wp-content/uploads/2014/10/Vivalt-JMP-10.27.14.pdf

White,Halbert,1980,“Aheteroskedasticity-consistentcovariancematrixestimatorandadirecttestforheteroskedasticity,”Econometrica,50(1),1–25.

Wise,DavidA.,1985,“Abehavioralmodelversusexperimentation:theeffectsofhousingsubsi-diesonrent,”inP.BruckerandR.Pauly,eds..MethodsofOperationsResearch,50,VerlagAnonHain.441–89.

Worrall,John,2002,“WhatEvidenceinEvidence-BasedMedicine?”PhilosophyofScience69,S316-S330.

Worrall,John,2007,“Evidenceinmedicineandevidence-basedmedicine,”PhilosophyCompass,2/6,981–1022.

Young,Alwyn,2016,“ChannelingFisher:randomizationtestsandthestatisticalinsignificanceofseeminglysignificantexperimentalresults,”LondonSchoolofEconomics,WorkingPaper,Feb.

Ziliak,StephenT.,2014,“Balancedversusrandomizedfieldexperimentsineconomics:whyW.S.Gossetaka‘Student’matters,”ReviewofBehavioralEconomics,1,167–208.