Big Data and Big Ci+es: The Promises and Limita+ons of Improved Measures of City Life Ed Glaeser, ScoC Kominers, Mike Luca and Nikhil Naik
BigDataandBigCi+es:ThePromisesandLimita+onsof
ImprovedMeasuresofCityLife
EdGlaeser,ScoCKominers,MikeLucaandNikhilNaik
Outline
• Part1:BigDataandUrbanQues+ons– Toomuchbigthinkonbigdata
• Part2:MeasuringCityLifewhendataismissing– Lookforwardtomylecturelater– MeasuringtheimpactofwaterinZambia
• Part3:UsingBigDatatoImproveCityServices– Modestmodelontournamentsvs.consultants– ReportonahygienetournamentinBoston
BigDataandBigQues+onsaboutCi+es
• Howdoesurbandevelopmentimpacttheeconomy?– Shockstopeoplevs.shockstoplace
• Howthephysicalcityinteractwithsocialoutcomes?– Measuringthephysicalcitywithbigdata
• Howmuchdopeoplevalueurbanameni+es?– Measuringameni+esandbeCercon+ngentvalua+on
• Howcanpublicpolicyimprovethequalityofurbanspace?– Merginggovernmentac+onswithphysicalmeasures
ExamplesofBigData
• Muchfinergeographicrecords(theIRSdata)• Similardatafromprivateproviders(corelogic)• Noveldatasetsontradi+onaloutcomes(Zoona)
• Noveldatasetsonrela+velynewthings(Yelp)• Completelydifferentdataonthingswehadbarelythoughtaboutbefore(GoogleStreetview)
What’sItGoodFor• Bigdatadoesnotintrinsicallysolveanyofthecausalinferenceissuesthatwehavelongworriedabout.
• Itdoesmakeitpossibletomeasuremorethings(hygiene,streetscapes)inmoreplacesinmoreways.
• IRSrecordsprovidethemother-of-all-panelsets,whichispar+cularlyusefulforspa+alinterven+ons– Therightwaytojudgeempowermentzones,forexample,wouldbetousethepanelstructure
LedAstrayBy“Bigger”Data(.3)
MeasuringthePreviouslyUnmeasurable
• Weareusedtohavingpublicsourcesfordataonthemostbasiceconomicoutcome:income
• Thisistypicallynottrueinthedevelopingworld,especiallysub-SaharanAfrica.
• Especiallyfalseforausablepanel• Example#1Zoonadata,waterandhealth• Example#2GoogleStreetview:essen+allynightlightsonsteroids
ZoonainZambia
MeasuringStreetscapes(withNikhilNaik)
CrowdsourcingCityGovernment:UsingTournamentstoImproveInspec+onAccuracy
EdwardL.GlaeserAndrewHillis
ScoCDukeKominersMichaelLuca
BigData:Consumerreviewwebsites
• Partofcrowdsourcingmovement.
Yelp
• Luca2011:highra+ngsincreaserevenueforindependentrestaurants
• ChevalierandMayzlin2006:Barnes&Noble,Amazonandonlinebookorders
• Ghoseetal2011:TripAdvisorandhotelreserva+ons
YelpSearch
Restaurant’s Yelp Page
Somebackground
• LosAngelesin1997…– Pos+ngà
• higherscores• lowerratesoffoodborneillness• JinandLeslie(2003)
– Majorsuccessstoryofdisclosure
• NYCin2010• Yet,alothaschanged…
TheRiseofTournaments
• Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– e.g.,210predic+ontournamentsonKaggle,withprizesrangingfrom$0to$500,000.
TheRiseofTournaments
• Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– Returnsarenotjustcash–alsorecogni+on,jobinterviews,cer+fica+on,sa+sfac+on,andlearning.
AnEconomicDesignQues+on
(When)canopentournamentshelpsolvepublicproblems?
Theory
• Tournamentsmakesensewhen:– theprobabilityofabreakthrough,𝜑,ishigh;– whenthebaselinelow-skilledoutcome,𝑞 ,isnotthatbad;– andwhenthebestoutcome,𝑞↓𝑚𝑎𝑥 ,ispar+cularlygood.
• Wageinequalitymakestournamentsmoreappealing.
• TournamentsareunaCrac+veforensuring¯𝑞 .
� TournamentsmaybebecomingmoreaCrac+ve!
Theory→Prac+ce
� TournamentsmaybebecomingmoreaCrac+ve!
� Weranone!
Conjecture
• Inspec+onanddisclosurepoliciescanbeenhancedbyworkingwithsocialmedia:– Socialmediaisapoten+alplarormfordisclosure– Op+maldisclosureisafunc+onofwhatpeoplearesayingonsocialmedia
• Designdisclosureofhygieneviola+onsthroughYelpplarormUseYelpreviewtexttoguideinspec+ons.– Inspec+onsarefairlyrandom,buttheydon’thavetobe!
Whyrestauranthygieneinspec+ons?
• Dataandtechnologyhavechanged– Policyhasremainedthesame
• Disclosureside– MarketwithveryliCleinforma+on– Earlysuccessstoryofdisclosure(JinandLeslie2003),soknownpoten+alimpact
• Idealsesngforinforma+ondesignques+ons– Whatcondi+onscausepos+ngtowork?– Whatarethebehavioralfactorsunderlyingcustomerresponse?
• Scopeforimprovingpolicy– DaiandLuca2016
HygieneInspec+ons• Processandscoringvaries(some+mesalot)bycity• InSF:
– restaurantsinspectedroughly2Xperyear.– viola+onsclassifiedasmajor(lotsofrats)andminor(arat)– finalscorebetween0and100
• InBoston:– Restaurantsinspectedatleastonceperyear– Viola+onsclassifiedasminor,major,andsevere– Un+lnow,nogrades
• Goal:– Iden+fyrisks– Shutdownworstoffenders,enforcecleanup
Essen+allyapredic+onproblem
• Whichrestaurantismostlikelytohaveaviola+on?
• Bytarge+nginspec+ons,canbemoreefficient:– Iden+fymorerisks,or,– Reducenumberofinspec+ons
• Eg:1randomannualinspec+onforeachrestaurant,plustargeted
Treatment: Inspection Results on Yelp
Arehygienescorespredictable?
• Yelpreviewersprovidelotsofnewinforma+on,but…
• Poten+alpiralls:– Fakereviews– Selec+on– Hygienemaynotfactorintoreviews
Distribution of Hygiene Scores
Hygiene Scores by Restaurant Price
Yelp Ratings Predict Hygiene Scores
Upda+ngtheInspec+onProcess
• Layeringonuseoftext,canpredictroughly85%ofrestaurantsintotop/boComhalfofscores(Kang,Kuznetsova,Luca,andChoi2013)
• Relatedpilots
Tournament:
• CosponsoredwithYelp• SupportedbyCityofBoston• CombinedYelpdatawithBostoninspec+onresults:– Objec+vetopredictviola+ons.– Weightschosenbycity(minor=1,major=2,severe=5).
– EvaluatedusingRMSLE
Tournament:Rewards
PlacePrize
Amount1st $3,000
2nd $1,000
3rd $1,000
PrizemoneyprovidedbyYelp
Compe++onProcess
Target:Inspec+onViola+ons
Target:Inspec+onViola+ons
Target:Inspec+onViola+ons
Target:Inspec+onViola+ons
Target:Inspec+onViola+ons
Results
• >500signups• Developmentphase:
– ~55completedatleastoneentry– ~450setsofpredic+ons
• Evalua+onphase:– 23submiCedfinalalgorithms– Duringthis+me,Bostoninspected364restaurants
TheWinner
TheRunnerUp
GainsforBoston:~40%
Tocatch3,604weightedviola+ons,inspectthismanyrestaurants:
GainsforBostonIfchoosingthe364restaurantswiththehighestpredictedviola+ons,expecttoobtaintotalviola+ons:
Ongoingwork
• Launchingatrial– Startsthismonth
• Incorpora+ngintoday-to-dayinspec+ons• Ongoingchallenges:
– Othercitygoals?– Gamability?– Transferability?
Epilogue
• ResultsoftheAlgorithmweregiventoinspectorstoimproveaccuracy.
• Thenwelookedathowtheydidusingtheirownbestprac+cesvs.thefancyalgorithmvs.areallysimplealgorithm.
• Thefancyalgorithmdoeshelp–butthesimplealgorithmgetsmostofthewaythere.
• Insomethings,gesngthebasicsrightisfarmoreimportantthantoomuchfancymath.