Top Banner
Big Data and Big Ci+es: The Promises and Limita+ons of Improved Measures of City Life Ed Glaeser, ScoC Kominers, Mike Luca and Nikhil Naik
51

Big Data and Big Cities

Jan 25, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data and Big Cities

BigDataandBigCi+es:ThePromisesandLimita+onsof

ImprovedMeasuresofCityLife

EdGlaeser,ScoCKominers,MikeLucaandNikhilNaik

Page 2: Big Data and Big Cities

Outline

•  Part1:BigDataandUrbanQues+ons–  Toomuchbigthinkonbigdata

•  Part2:MeasuringCityLifewhendataismissing–  Lookforwardtomylecturelater– MeasuringtheimpactofwaterinZambia

•  Part3:UsingBigDatatoImproveCityServices– Modestmodelontournamentsvs.consultants–  ReportonahygienetournamentinBoston

Page 3: Big Data and Big Cities

BigDataandBigQues+onsaboutCi+es

•  Howdoesurbandevelopmentimpacttheeconomy?–  Shockstopeoplevs.shockstoplace

•  Howthephysicalcityinteractwithsocialoutcomes?– Measuringthephysicalcitywithbigdata

•  Howmuchdopeoplevalueurbanameni+es?– Measuringameni+esandbeCercon+ngentvalua+on

•  Howcanpublicpolicyimprovethequalityofurbanspace?– Merginggovernmentac+onswithphysicalmeasures

Page 4: Big Data and Big Cities

ExamplesofBigData

•  Muchfinergeographicrecords(theIRSdata)•  Similardatafromprivateproviders(corelogic)•  Noveldatasetsontradi+onaloutcomes(Zoona)

•  Noveldatasetsonrela+velynewthings(Yelp)•  Completelydifferentdataonthingswehadbarelythoughtaboutbefore(GoogleStreetview)

Page 5: Big Data and Big Cities

What’sItGoodFor•  Bigdatadoesnotintrinsicallysolveanyofthecausalinferenceissuesthatwehavelongworriedabout.

•  Itdoesmakeitpossibletomeasuremorethings(hygiene,streetscapes)inmoreplacesinmoreways.

•  IRSrecordsprovidethemother-of-all-panelsets,whichispar+cularlyusefulforspa+alinterven+ons–  Therightwaytojudgeempowermentzones,forexample,wouldbetousethepanelstructure

Page 6: Big Data and Big Cities

LedAstrayBy“Bigger”Data(.3)

Page 7: Big Data and Big Cities

MeasuringthePreviouslyUnmeasurable

•  Weareusedtohavingpublicsourcesfordataonthemostbasiceconomicoutcome:income

•  Thisistypicallynottrueinthedevelopingworld,especiallysub-SaharanAfrica.

•  Especiallyfalseforausablepanel•  Example#1Zoonadata,waterandhealth•  Example#2GoogleStreetview:essen+allynightlightsonsteroids

Page 8: Big Data and Big Cities

ZoonainZambia

Page 9: Big Data and Big Cities
Page 10: Big Data and Big Cities
Page 11: Big Data and Big Cities
Page 12: Big Data and Big Cities
Page 13: Big Data and Big Cities

MeasuringStreetscapes(withNikhilNaik)

Page 14: Big Data and Big Cities

CrowdsourcingCityGovernment:UsingTournamentstoImproveInspec+onAccuracy

EdwardL.GlaeserAndrewHillis

ScoCDukeKominersMichaelLuca

Page 15: Big Data and Big Cities

BigData:Consumerreviewwebsites

•  Partofcrowdsourcingmovement.

Page 16: Big Data and Big Cities

Yelp

•  Luca2011:highra+ngsincreaserevenueforindependentrestaurants

•  ChevalierandMayzlin2006:Barnes&Noble,Amazonandonlinebookorders

•  Ghoseetal2011:TripAdvisorandhotelreserva+ons

Page 17: Big Data and Big Cities
Page 18: Big Data and Big Cities

YelpSearch

Page 19: Big Data and Big Cities

Restaurant’s Yelp Page

Page 20: Big Data and Big Cities

Somebackground

•  LosAngelesin1997…– Pos+ngà

•  higherscores•  lowerratesoffoodborneillness•  JinandLeslie(2003)

– Majorsuccessstoryofdisclosure

•  NYCin2010•  Yet,alothaschanged…

Page 21: Big Data and Big Cities

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– e.g.,210predic+ontournamentsonKaggle,withprizesrangingfrom$0to$500,000.

Page 22: Big Data and Big Cities

TheRiseofTournaments

•  Now,organiza+onscanoutsourcelarge-scalepredic+onproblemsviaopentournaments!– Returnsarenotjustcash–alsorecogni+on,jobinterviews,cer+fica+on,sa+sfac+on,andlearning.

Page 23: Big Data and Big Cities

AnEconomicDesignQues+on

(When)canopentournamentshelpsolvepublicproblems?

Page 24: Big Data and Big Cities

Theory

•  Tournamentsmakesensewhen:–  theprobabilityofabreakthrough,𝜑,ishigh;–  whenthebaselinelow-skilledoutcome,𝑞 ,isnotthatbad;–  andwhenthebestoutcome,𝑞↓𝑚𝑎𝑥 ,ispar+cularlygood.

•  Wageinequalitymakestournamentsmoreappealing.

•  TournamentsareunaCrac+veforensuring¯𝑞 .

� TournamentsmaybebecomingmoreaCrac+ve!

Page 25: Big Data and Big Cities

Theory→Prac+ce

� TournamentsmaybebecomingmoreaCrac+ve!

� Weranone!

Page 26: Big Data and Big Cities

Conjecture

•  Inspec+onanddisclosurepoliciescanbeenhancedbyworkingwithsocialmedia:–  Socialmediaisapoten+alplarormfordisclosure– Op+maldisclosureisafunc+onofwhatpeoplearesayingonsocialmedia

•  Designdisclosureofhygieneviola+onsthroughYelpplarormUseYelpreviewtexttoguideinspec+ons.–  Inspec+onsarefairlyrandom,buttheydon’thavetobe!

Page 27: Big Data and Big Cities

Whyrestauranthygieneinspec+ons?

•  Dataandtechnologyhavechanged–  Policyhasremainedthesame

•  Disclosureside– MarketwithveryliCleinforma+on–  Earlysuccessstoryofdisclosure(JinandLeslie2003),soknownpoten+alimpact

•  Idealsesngforinforma+ondesignques+ons– Whatcondi+onscausepos+ngtowork?– Whatarethebehavioralfactorsunderlyingcustomerresponse?

•  Scopeforimprovingpolicy–  DaiandLuca2016

Page 28: Big Data and Big Cities

HygieneInspec+ons•  Processandscoringvaries(some+mesalot)bycity•  InSF:

–  restaurantsinspectedroughly2Xperyear.–  viola+onsclassifiedasmajor(lotsofrats)andminor(arat)–  finalscorebetween0and100

•  InBoston:–  Restaurantsinspectedatleastonceperyear–  Viola+onsclassifiedasminor,major,andsevere–  Un+lnow,nogrades

•  Goal:–  Iden+fyrisks–  Shutdownworstoffenders,enforcecleanup

Page 29: Big Data and Big Cities

Essen+allyapredic+onproblem

•  Whichrestaurantismostlikelytohaveaviola+on?

•  Bytarge+nginspec+ons,canbemoreefficient:–  Iden+fymorerisks,or,– Reducenumberofinspec+ons

•  Eg:1randomannualinspec+onforeachrestaurant,plustargeted

Page 30: Big Data and Big Cities

Treatment: Inspection Results on Yelp

Page 31: Big Data and Big Cities

Arehygienescorespredictable?

•  Yelpreviewersprovidelotsofnewinforma+on,but…

•  Poten+alpiralls:– Fakereviews– Selec+on– Hygienemaynotfactorintoreviews

Page 32: Big Data and Big Cities

Distribution of Hygiene Scores

Page 33: Big Data and Big Cities

Hygiene Scores by Restaurant Price

Page 34: Big Data and Big Cities

Yelp Ratings Predict Hygiene Scores

Page 35: Big Data and Big Cities

Upda+ngtheInspec+onProcess

•  Layeringonuseoftext,canpredictroughly85%ofrestaurantsintotop/boComhalfofscores(Kang,Kuznetsova,Luca,andChoi2013)

•  Relatedpilots

Page 36: Big Data and Big Cities

Tournament:

•  CosponsoredwithYelp•  SupportedbyCityofBoston•  CombinedYelpdatawithBostoninspec+onresults:– Objec+vetopredictviola+ons.– Weightschosenbycity(minor=1,major=2,severe=5).

– EvaluatedusingRMSLE

Page 37: Big Data and Big Cities

Tournament:Rewards

PlacePrize

Amount1st $3,000

2nd $1,000

3rd $1,000

PrizemoneyprovidedbyYelp

Page 38: Big Data and Big Cities
Page 39: Big Data and Big Cities

Compe++onProcess

Page 40: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 41: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 42: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 43: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 44: Big Data and Big Cities

Target:Inspec+onViola+ons

Page 45: Big Data and Big Cities

Results

•  >500signups•  Developmentphase:

– ~55completedatleastoneentry– ~450setsofpredic+ons

•  Evalua+onphase:– 23submiCedfinalalgorithms– Duringthis+me,Bostoninspected364restaurants

Page 46: Big Data and Big Cities

TheWinner

Page 47: Big Data and Big Cities

TheRunnerUp

Page 48: Big Data and Big Cities

GainsforBoston:~40%

Tocatch3,604weightedviola+ons,inspectthismanyrestaurants:

Page 49: Big Data and Big Cities

GainsforBostonIfchoosingthe364restaurantswiththehighestpredictedviola+ons,expecttoobtaintotalviola+ons:

Page 50: Big Data and Big Cities

Ongoingwork

•  Launchingatrial– Startsthismonth

•  Incorpora+ngintoday-to-dayinspec+ons•  Ongoingchallenges:

– Othercitygoals?– Gamability?– Transferability?

Page 51: Big Data and Big Cities

Epilogue

•  ResultsoftheAlgorithmweregiventoinspectorstoimproveaccuracy.

•  Thenwelookedathowtheydidusingtheirownbestprac+cesvs.thefancyalgorithmvs.areallysimplealgorithm.

•  Thefancyalgorithmdoeshelp–butthesimplealgorithmgetsmostofthewaythere.

•  Insomethings,gesngthebasicsrightisfarmoreimportantthantoomuchfancymath.