Top Banner
crimeX Real time crime analysis and alert system Tajinder Singh
21

Tajinder Presentation6

Jan 26, 2017

Download

Documents

Tajinder Singh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tajinder Presentation6

crimeX Real time crime analysis and alert system

Tajinder Singh

Page 2: Tajinder Presentation6

Motivation

Page 3: Tajinder Presentation6

Motivation

• How criminals operate

• Dynamics between criminals and anti criminal squad

Page 4: Tajinder Presentation6

Demo

www.crimefighter.ninja

Page 5: Tajinder Presentation6

Pipeline

Crime data (real)

User data (real)

Crime data (batch)

Ingestion Batch Layer Serving Layer

Real Time

Page 6: Tajinder Presentation6

Data flow

• Seed: http://us-city.census.okfn.org/dataset/crime-stats

• Engineered Data (600 GB)

Data sources

Page 7: Tajinder Presentation6

Data flow

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

Crime data (batch)

Batch Processing

Page 8: Tajinder Presentation6

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

Page 9: Tajinder Presentation6

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

+ Python Script (Refining)

Page 10: Tajinder Presentation6

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”,

“zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa”}

Index Type: crimes

Page 11: Tajinder Presentation6

Data flow

{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}

Real Time Processing

Crime data User data

{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }

Page 12: Tajinder Presentation6

Data flow

{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}

Real Time Processing

[ Processing ]

Crime data User data { “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }

Page 13: Tajinder Presentation6

Data flow Real Time Processing

{ “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”, “lat”: “34.5462”,

“lon”: “-118.453”, “zip”:”90007”, “city”: “los angeles”, “state”:”california”,

“country”:”usa”}

Crime data User data

{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243”,

”zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa” }

Index Type: crimes_realtime and user-subscribe-crime

Page 14: Tajinder Presentation6

Data flow use case 1 (batch)

Input [ “location”:”2611 portland street, los

angeles”]

Page 15: Tajinder Presentation6

Data flow use case 1 (batch)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp

Page 16: Tajinder Presentation6

Data flow use case 1 (batch)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp

[output]

Page 17: Tajinder Presentation6

Data flow use case 2 (real)

Real Time [ “crimetype”:”robbery”, “lat”:

”34.2353”, “lon”:”-113.42534”]

Page 18: Tajinder Presentation6

Data flow use case 2 (real)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Alert nearby users

User Phone number

User Name

User latitude

User longitude

[output]

Page 19: Tajinder Presentation6

Challenge: Front-end display after 5 seconds per request

Reason:

• A lot of I/O operations (all crime documents were fetched to the UI)

• Business logic and query execution on front-end (flask)

Solution:

• Query execution on Elasticsearch cluster

• NO I/O operation

• Dynamic scripting enabled on ES cluster.

• Used Groovy scripts as opposed to Javascript, Python, MVEL (built-in),

expression (built-in) etc.

Challenge: Network Latency

Solution: Co-locate Storm and Elasticsearch cluster nodes to reduce network

latency

Performance Optimization

Challenges

Page 20: Tajinder Presentation6

Caveat: Vulnerable to outside attacks (Security vulnerability)

Reason:

• Enabled dynamic scripting

Solution:

• Don’t run Elasticsearch as root

• Provide read-only access to requisite directories

Performance Optimization

Challenges

Page 21: Tajinder Presentation6

about me

Tajinder Singh [University of Southern California]

5 yrs experience in web development