Top Banner
Go DataDriven PROUDLY PART OF THE XEBIA GROUP Real time data driven applications Giovanni Lanzani Data Whisperer and SQL vs NoSQL databases
30

Real time data driven applications (SQL vs NoSQL databases)

Aug 17, 2015

Download

Data & Analytics

GoDataDriven
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real time data driven applications (SQL vs NoSQL databases)

GoDataDrivenPROUDLY PART OF THE XEBIA GROUP

Real time data driven applications

Giovanni Lanzani Data Whisperer

and SQL vs NoSQL databases

Page 2: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Feedback

@gglanzani

Page 3: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Real-time, data driven app?

• No store and retrieve;

• Store, {transform, enrich, analyse} and retrieve;

• Real-time: retrieve is not a batch process;

• App: something your mother could use:

SELECT attendees FROM NoSQLMatters WHERE password = '1234';

Page 4: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Get insight about event impact

Page 5: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Get insight about event impact

Page 6: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Get insight about event impact

Page 7: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Get insight about event impact

Page 8: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Get insight about event impact

Page 9: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Challenges1. Big Data 2. Privacy; 3. Some real-time analysis;

4. Real-time retrieval.

Page 10: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Is it Big Data?

Page 11: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Is it Big Data?Everybody talks about it

Nobody knows how to do it Everyone thinks everyone else is doing it, so everyone

claims they’re doing it… Dan Ariely

Page 12: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

2. Privacy

Page 13: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

2. Privacy

Page 14: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

3. (Some) real-time analysis

Page 15: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

• Harder than it looks;

• Large data;

• Retrieval is by giving date, center location + radius.

4. Real-Time Retrieval

Page 16: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

AngularJS python appREST

Front-end Back-end

JSON

Architecture

Page 17: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

JS-1

Page 18: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

JS-2

Page 19: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

date hour id_activity postcode hits delta sbi

2013-01-01 12 1234 1234AB 35 22 1

2013-01-08 12 1234 1234AB 45 35 1

2013-01-01 11 2345 5555ZB 2 1 2

2013-01-08 11 2345 5555ZB 55 2 2

Data Example

Page 20: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Who has my data?

Page 21: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Who has my data?

• First iteration was a (pre)-POC, less data (3GB vs 500GB);

• Time constraints;

• Oeps: everything is a pandas df!

Page 22: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Advantage of “everything is a df ”

Pro:

• Fast!!

• Use what you know

• NO DBA’s!

• We all love CSV’s!

Page 23: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Advantage of “everything is a df ”

Pro:

• Fast!!

• Use what you know

• NO DBA’s!

• We all love CSV’s!

Contra:

• Doesn’t scale;

• Huge startup time;

• NO DBA’s!

• We all hate CSV’s!

Page 24: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

AngularJS python appREST

Front-end Back-end Database

JSON?

If you don’t

Page 25: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Issues?!

• With a radius of 10km, in Amsterdam, you get 10k postcodes. You need to do this in your SQL: !

!

!

• Index on date and postcode, but single queries running more than 20 minutes.

SELECT * FROM datapoints WHERE date IN date_array AND postcode IN postcode_array;

Page 26: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

PostGIS is a spatial database extender for PostgreSQL. Supports geographic objects allowing location queries:

SELECT * FROM datapoints WHERE ST_DWithin(lon, lat, 1500) AND dates IN ('2013-02-30', '2013-02-31'); -- every point within 1.5km -- from (lat, lon) on imaginary dates

Postgres + Postgis (2.x)

Page 27: Real time data driven applications (SQL vs NoSQL databases)

Other db’s?

Page 28: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

How we solved it1. Align data on disk by date; 2. Use the temporary table trick:

!

!

!

!

3. Lose precision: 1234AB→1234

CREATE TEMPORARY TABLE tmp (postcodes STRING NOT NULL PRIMARY KEY); INSERT INTO tmp (postcodes) VALUES postcode_array; !SELECT * FROM tmp JOIN datapoints d ON d.postcode = tmp.postcodes WHERE d.dt IN dates_array;

Page 29: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

Take home messages1. Geospatial problems are hard and queries can be

really slow; 2. Not everybody has infinite resources: be smart

and KISS! 3. SQL or NoSQL? (Size, schema)

Page 30: Real time data driven applications (SQL vs NoSQL databases)

GoDataDriven

We’re hiring / Questions? / Thank you!

@gglanzani [email protected]

Giovanni Lanzani Data Whisperer