Accenture MIT Data Science Challenge Abbas Keshvani Accenture Analytics Innovation Centre, Singapore Chicago skylin
Jan 26, 2015
Accenture MIT Data Science ChallengeAbbas KeshvaniAccenture Analytics Innovation Centre, Singapore
Chicago skyline
Problem
Chicagoans expect better service
• City of Chicago provides heavy duty carts to homes
• Some homes lack a cart – new home, theft, damage
• Chicagoans request new carts by calling 311:
1. Some requests are not completed, leaving residents without refuse facilities
2. Other requests are resolved very slowly
1. Completing requests
Problem: requests are not completed
• 4757 open cases,
• 1372 have been open for a long time (120 days or more)
• Leaves residents without carts to dispose of garbage
1. Completing requests
Investigation: where are the open cases?
• Plot map of unserved requests
• Red areas have a high concentration of open cases
• Found mainly in the western interior of the city
1. Completing requests
Solution
• Improve coverage of areas in red
o Oak Park
o West Side
o Dolton
• Oak Park
• West Side
• Dolton
2. Resolving requests efficiently
Problem: resolution time is slow
• Mean time to resolve a single request shows seasonality
• Peaks in June/July and troughs in December/January
• Same June/December seasonality seen in
1. Total number of requests
2. Total time to resolve all requests
• But the magnitude of the seasonality is less in (1) than in (2), shown by shallower valleys
2. Resolving requests efficiently
Investigation: cause of slow resolution time
Number of requests Total time
• Disproportionate increase in total time, in response to increase in number of requests
• Indicated City of Chicago is operating at full capacity in summer months
• Resolution can be achieved by increasing capacity in summer by hiring more staff
2. Resolving requests efficiently
Solution
Number of requests Total time
Data
#daily aggregates for time taken to resolve
daily<-matrix(NA,16126-15001+1,1)
for(i in 15001:16126)
{
series.i<-c(garbage7[garbage7$creation==i,12])
day.i<-sum(series.i)
daily[i-15000]<-day.i
}
#daily aggregates for number of requests
no.of.req<-matrix(NA,16126-15001+1,1)
for(i in 15001:16126)
{
series.i<-nrow(garbage7[garbage7$creation==i,])
no.of.req[i-15000]<-series.i
}
#consolidate data
ts<-cbind(15001:16126,daily,no.of.req)
ts<-data.frame(ts)
ts[,4]<-as.Date(ts[,1],origin="1970-01-01")
colnames(ts)<-c("Day","Lag","Number of requests","Date")
ts[,"Mean Lag"]<-ts$Lag/ts$"Number of requests"
Map:
#get map from google maps
chicago<-get_map(location = "chicago",
zoom = 11, scale = "auto",
maptype = "terrain",
messaging = FALSE, urlonly = FALSE,
filename = "ggmapTemp", crop = TRUE,
color = c("color", "bw"),
source = c("google", "osm", "stamen", "cloudmade"),
api_key) #prepare chicago map
m<-ggmap(chicago)
m + geom_point(data=garbage3,aes(x=lon,y=lat),alpha=0) + #add points
ggtitle("Heatmap of Open cases") + #add a title
stat_binhex(bins = 60, mapping=NULL, data=trash, alpha=0.7) + #cluster data points into hexagons
scale_fill_gradient(low="blue",high="red",limits=c(0,300), na.value="red") #choose colours for binning
Plots:
#plot time taken to resolve a request
p<- ggplot(ts, aes(x=Date, y=Lag))
p + #you get an error if not for this step
geom_point(size=1.2) +
geom_smooth() +
ylim(-1000,20000) +
ggtitle("Lag to resolve a request")
#plot mean time to resolve a request
p<- ggplot(ts, aes(x=Date, y=ts[,3]))
p + #you get an error if not for this step
geom_point(size=1.2) +
ylab("Number of requests") +
geom_smooth() +
ggtitle("Mean lag to resolve a request")
#plot number of daily requests
p<- ggplot(ts, aes(x=Date, y=ts[,5]))
p + #you get an error if not for this step
geom_point(size=1.2) +
geom_smooth() +
ylab("Mean lag") +
ggtitle(“Number of requests")
R code used