Big Data and Big Cities Ed Glaeser (joint work with Nikhil Naik, Mike Luca and Scott Kominers)
Big Data and Big Cities
Ed Glaeser
(joint work with Nikhil Naik, Mike Luca and Scott Kominers)
Detroit
Houston
Las Vegas
New York
Phoenix
DC
-1-.
75
-.5
-.2
5
0
.25
Ch
an
ge
in F
HF
A P
rice, 2
00
6-2
01
1
0 .2 .4 .6 .8Change in FHFA Price, 2001-2006
Change in Housing Prices, 2001-2006 vs. 2006-2011
0
.05
.1.1
5.2
Avera
ge
Cha
ng
e in
FH
FA
, 1
99
6-2
01
2
1 2 3 4 5Note: For MSAs with populations greater than 250,000 in 2010.
Change in FHFA, 1996-2012by Quintile of Population Density, 2010
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000Ja
n 1
96
0
Mar
19
61
May
19
62
Jul 1
96
3
Sep
19
64
No
v 1
96
5
Jan
19
67
Mar
19
68
May
19
69
Jul 1
97
0
Sep
19
71
No
v 1
97
2
Jan
19
74
Mar
19
75
May
19
76
Jul 1
97
7
Sep
19
78
No
v 1
97
9
Jan
19
81
Mar
19
82
May
19
83
Jul 1
98
4
Sep
19
85
No
v 1
98
6
Jan
19
88
Mar
19
89
May
19
90
Jul 1
99
1
Sep
19
92
No
v 1
99
3
Jan
19
95
Mar
19
96
May
19
97
Jul 1
99
8
Sep
19
99
No
v 2
00
0
Jan
20
02
Mar
20
03
May
20
04
Jul 2
00
5
Sep
20
06
No
v 2
00
7
Jan
20
09
Mar
20
10
May
20
11
Jul 2
01
2
Sep
20
13
No
v 2
01
4
Single Family and Multi-Family Permits Over Time
Multi Family Permits Single Family Permits
-.0
50
.05
.1.1
5
Avera
ge
Po
pu
lation
Ch
an
ge
, 20
00
-20
10
16
00
018
00
020
00
022
00
024
00
0
Avera
ge
Pe
r C
ap
ita Inco
me, 2
00
0
0 2 4 6 8 10Population Density
Per Capita Income, 2000
Population Change, 2000-2010
Source: U.S. Census
Will the last person to leave Seattle (and Milan) please turn out the lights?
Photo by PostdilArtist’s Impression by Daniel Libeskind Studio
0
.05
.1.1
5
Avera
ge
Po
pu
lation
Gro
wth
by C
oun
ty, 2
000
-20
10
1 2 3 4 5
Average Population Growth by Share with BA in 2000(Quintiles)
Share of Adults with B.A.s 2000
Per Capita GDP 2010 .
.1 .2 .3 .4 .5
20000
40000
60000
80000
100000
o
Bakersfi
oo
o
Las Vega
oo
o
o
oo
o
ooo
o
o
o
o
o
oDetroit
oo
o
o
o
o
oo
ooo
oo
o
o
o
oo
o
o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
New York
o
oo
o
o
o
o
Atlanta
o
Boston
o
o
o
ooo
San Jose
oSan Fran
Chinitz: Contrasts in Agglomeration: New York and Pittsburgh
0.5
11.5
2
Avera
ge
Pe
rcen
t G
row
th in E
mplo
ym
ent, 1
97
7-2
01
0
1 2 3 4 5Smallest firms are in Quintile 1
MSA Employment Growth (1977-2010)by Average Firm Size (1977) Quintiles
Economic Growth and Firm Size
South BostonWaterfront
Innovation District
Logan
Airport
Downtown
Rich and Poor Innovation Districts
Local Regulation: Chamber of Commerce Red Tape (Higher is Less)
Subjective Well-Being Across Space
Subjective Well-Being and Population Growth
-.2
-.1
0.1
.2
Hap
pine
ss a
fter
exog
enou
s de
mog
raph
ic
cont
rols
, 200
5-20
10
0 1 2 3 4Change in Log Population, 1950-2000
Using Big Data To Solve City Problems
• The Economic Agenda: Education and Entrepreneurship, and Incomes.
• The Demons of Density: Contagious Disease, Crime, Congestion and High Housing Prices.
• The Forms of Big Data – Much finer geographic records (the IRS data)– Similar data from private providers (corelogic)– Novel data sets on traditional outcomes (Zoona)– Novel data sets on relatively new things (Yelp)– Completely different data on things we had barely
thought about before (Google Streetview)
What’s It Good For
• Big data does not intrinsically solve any of the causal inference issues that we have long worried about.
• It does make it possible to measure more things (hygiene, streetscapes) in more places in more ways.
• IRS records provide the mother-of-all-panel sets, which is particularly useful for spatial interventions– The right way to judge empowerment zones, for
example, would be to use the panel structure
Led Astray By “Bigger” Data (.3)
Big Data and Education in the US
• Early Childhood Interventions (Heckman)
• Teacher Quality (Chetty, Friedman, Rockoff)
• Charter Schools (Angrist, Pathak, Walters)
• Science and Math (Joshua Goodman)
Zoona in Zambia
Restaurant hygiene inspections
• Data and technology have changed– Policy has remained the same
• Disclosure side– Market with very little information – Early success story of disclosure (Jin and Leslie 2003), so
known potential impact
• Ideal setting for information design questions– What conditions cause posting to work?– What are the behavioral factors underlying customer
response?
• Scope for improving policy– Dai and Luca 2016
Tournaments and Hygiene Inspections
• Process and scoring varies (sometimes a lot) by city• In SF:
– restaurants inspected roughly 2X per year.– violations classified as major (lots of rats) and minor (a rat)– final score between 0 and 100
• In Boston:– Restaurants inspected at least once per year – Violations classified as minor, major, and severe– Until now, no grades
• Goal:– Identify risks– Shut down worst offenders, enforce clean up
Essentially a prediction problem
• Which restaurant is most likely to have a violation?
• By targeting inspections, can be more efficient:
– Identify more risks, or,
– Reduce number of inspections
• Eg: 1 random annual inspection for each restaurant, plus targeted
Tournament:
• Cosponsored with Yelp
• Supported by City of Boston
• Combined Yelp data with Boston inspection results:
– Objective to predict violations.
– Weights chosen by city (minor = 1, major = 2, severe = 5).
– Evaluated using RMSLE
Yelp Ratings Predict Hygiene Scores
Tournament: Rewards
PlacePrize
Amount
1st $3,000
2nd $1,000
3rd $1,000
Prize money provided by Yelp
Competition Process
Results
• > 500 signups
• Development phase:
– ~55 completed at least one entry
– ~450 sets of predictions
• Evaluation phase:
– 23 submitted final algorithms
– During this time, Boston inspected 364 restaurants
Gains for Boston: ~40%
To catch 3,604 weighted violations, inspect this many restaurants:
Crime: NYC Homicides per 100,000
Ray Kelly vs. Ed Davis: Technology and Community Policing
Ed Davis by Michael CummoRayKelly by David Shankbone
Engineering vs. Economics
The Curitiba Innovation
Picture by Mariordo
Photo by Mario Roberto Duran Ortiz
Bottom Up Innovation: Zipcar
The Physical City: NIMBYism vs. Monumentalism
Astana by ChelseaFunNumberOne -
Training Sample – New York Income
R^2 = 0.85
Testing Sample – New York Income
R^2 = 0.81
Image by QuarterCircleS