Top Banner
DEEP LEARNING FOR PUBLIC SAFETY: FIGHTING CRIME WITH OPEN CITY DATA Alex Tellez, Michal Malohlava, and H2O.ai team
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning for Public Safety in Chicago and San Francisco

DEEP LEARNING FOR PUBLIC SAFETY: FIGHTING CRIME WITH OPEN CITY DATA

Alex Tellez, Michal Malohlava, and H2O.ai team

Page 2: Deep Learning for Public Safety in Chicago and San Francisco

OPEN CITIES

Many major cities around the world

provide easily accessible public data sets

with years of historical data

Currently this data is underused

Page 3: Deep Learning for Public Safety in Chicago and San Francisco

CHICAGO

Page 4: Deep Learning for Public Safety in Chicago and San Francisco

OPEN CRIME DATA

Crime Dataset: Crimes from 2001 - Present Day~ 4.6 million crimes

Page 5: Deep Learning for Public Safety in Chicago and San Francisco

THE WINDY CITY

Harvest Chicago Weather data since 2001

Page 6: Deep Learning for Public Safety in Chicago and San Francisco

SOCIOECONOMIC FACTORS

Crimes segmented into Community Area IDsPercent of households below poverty, unemployed, etc.

Page 7: Deep Learning for Public Safety in Chicago and San Francisco

SPARK + H2OWeather CrimesCensusWeatherWeather

Data munging

Spark SQL join

Deep Learning

Evaluate models

GOAL:For a given crime,

predict if an arrest is

more / less likely to be made!

Page 8: Deep Learning for Public Safety in Chicago and San Francisco

LOAD DATA INTO H2O

Weather Data5k rows

Census Data78 rows

Crime Data~4.5 Mn rows

Page 9: Deep Learning for Public Safety in Chicago and San Francisco

JOIN DATASETS

crimedata

weatherdata

censusdata

Using Spark, we join 3 datasets together to make one mega dataset!

Page 10: Deep Learning for Public Safety in Chicago and San Francisco

CHICAGO VISUALIZATIONS

arrest rate season of crime

temperature during crime

community crime is

committed in

Page 11: Deep Learning for Public Safety in Chicago and San Francisco

ARREST RATE BY TYPES OF CRIME

Page 12: Deep Learning for Public Safety in Chicago and San Francisco

ARREST RATE VS % OF TOTAL CRIMES

Arrest Rate

% of all crimes recorded

A large proportion of crimes are thefts

Unfortunately, there is a much lower arrest rate for thefts than for less

prevalent crimes like gambling

Page 13: Deep Learning for Public Safety in Chicago and San Francisco

SPLIT DATA INTO TEST/TRAIN SETS

training set arrest rate test set arrest rate

train model on this segment, 80% of data

validate the model on this segment (remaining 20%)

~40% of crimes lead to arrest

Page 14: Deep Learning for Public Safety in Chicago and San Francisco

DEEP LEARNINGProblem:

For a given crime, is an arrest more / less likely?

Deep Learning:

A multi-layer feed-forward neural network that starts

w/ an input layer (crime + weather data)

followed by multiple layers of

non-linear transformations

Page 15: Deep Learning for Public Safety in Chicago and San Francisco

DEEP LEARNING MODELDeep Neural Network w/ 2 layers of non-linear transformations

Binomial prediction: Is an arrest made? Yes/No

AUC on Training Data ~ 0.91!~ 3.5 Million Crimes

Page 16: Deep Learning for Public Safety in Chicago and San Francisco

HOW’D WE DO?Train AUC ~ 0.91 Test AUC ~ 0.91

Page 17: Deep Learning for Public Safety in Chicago and San Francisco

GEO-MAPPED PREDICTIONS

Because each of the crimes reported comes with latitude-longitude coordinates, we scored our hold out data using the trained model and plotted the predictions on a map of Chicago - specifically, the Downtown district.

Page 18: Deep Learning for Public Safety in Chicago and San Francisco

SAN FRANCISCO

Page 19: Deep Learning for Public Safety in Chicago and San Francisco

OPEN CITY, OPEN DATA

Crime Dataset: SFPD Incidents from 1/1/2003 - Present~1 Million Crimes

Page 20: Deep Learning for Public Safety in Chicago and San Francisco

WEATHER ANYONE?

Harvest weather data from 1/1/2003

Page 21: Deep Learning for Public Safety in Chicago and San Francisco

DATA INGESTION

Weather Data: Temp, Visibility, Precipitation, Cloud Cover

Crime Data: Category, Description,

Weekend, Arrest, etc

Page 22: Deep Learning for Public Safety in Chicago and San Francisco

SF VISUALIZATIONSMost common crimes? When is crime happening most?

…midnight, noon, 6 PM

Page 23: Deep Learning for Public Safety in Chicago and San Francisco

DEEP LEARNING MODELDeep Neural Network w/ 3 layers of non-linear transformationsTotal Run Time: 6 mins. 42 sec.

AUC ~ 0.95 on Training Data

Page 24: Deep Learning for Public Safety in Chicago and San Francisco

VALIDATION TESTModel ‘trained’ on 80% of data, validated against remaining 20%

AUC = 0.95 on validation data

Page 25: Deep Learning for Public Safety in Chicago and San Francisco

WHAT’S NEXT?Can deploy each model in real-time to increase public safety

and help police departments.

Map of Model Accuracy - For each point on the map (place of crime) we can have different colors based on model prediction (0.999 =

green, arrest likely vs. 0.67 = orange)

Run prediction for specific subsets of the data (i.e. most dangerous area)

We plan on doing all of the above!

Ensemble - Model average by running prediction models for Chicago + San Francisco which may increase accuracy more?