Top Banner
Geospatial Big Data Dr. Fabio Petroni
22

Geospatial Big Data

Apr 10, 2017

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Geospatial Big Data

Geospatial Big DataDr. Fabio Petroni

Page 2: Geospatial Big Data

2 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  exponential grow in volume of spatio-temporal data •  e.g., total number of foursquare check-ins: ∼8 billion

Motivation

Page 3: Geospatial Big Data

3 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Examples of Geospatial Big Data Analysis

Page 4: Geospatial Big Data

4 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

monitor the evolving sentiment trends over time and over geographies about BREXIT

Case Study

Page 5: Geospatial Big Data

5 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  Scalability -  storing, processing and visualizing large scale spatio-temporal data

three dimensions one dimension (latitude, longitude, time) lexicographical ordering of keys in a table

Challenges

B+ tree

Page 6: Geospatial Big Data

6 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  binary string in which each character indicates alternating divisions of the global longitude-latitude rectangle

Solution: Geohashes

0! 1!

Page 7: Geospatial Big Data

7 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Solution: Geohashes

00! 10!

01! 11!

Page 8: Geospatial Big Data

8 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Solution: Geohashes

0000! 0010 !

0001! 0011!

0100! 0110!

0101! 0111!

1000! 1010!

1001! 1011!

1100! 1110!

1101! 1111!

Page 9: Geospatial Big Data

9 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  z-order traversal of the globe via 4-bit geohashes

Solution: Z-order Traversal

0000! 0010 !

0001! 0011!

0100! 0110!

0101! 0111!

1000! 1010!

1001! 1011!

1100! 1110!

1101! 1111!

Page 10: Geospatial Big Data

10 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  a cluster node holds neighboring data points

Locality Aware Index1

2

3

4

Page 11: Geospatial Big Data

11 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

!

KPMG open source pipeline / stack

HDFS!

Accumulo!

!

!

visualization processing storage

!

!

large-scale data analysis

query and share data

Page 12: Geospatial Big Data

12 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  the GDELT Project monitors the world’s broadcast, print and web news from the entire world •  GDELT Global Knowledge Graph (GKG)

-  hyper-edges → represent news stories -  vertices → represent persons, organizations, locations, etc.

Experiments – GDELT Data

Hillary Clinton!

Donald Trump!

h"p://www.bbc.co.uk/….

Washington,.D.C..

New.York.City.

Tone:.?3.7.

Tone:.+5.1.

London.

h"p://www.nyGmes.com/….

e1! e2!

Page 13: Geospatial Big Data

13 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  ∼200000 data points (news stories)

Experiments - Brexit Dataset

•  First Location •  Date and Time •  URL •  Average Tone

data point 2 October 2016

London (51.509865, -0.118092) #01111010…

-2.76

Page 14: Geospatial Big Data

14 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

GeoServer / OpenLayers Data Visualization

Page 15: Geospatial Big Data

15 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

GeoServer / OpenLayers Heatmap - GeoMesa Plugin

Page 16: Geospatial Big Data

16 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Shiny / Leaflet - Interactive Data Visualization

Page 17: Geospatial Big Data

17 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

1.  project data points on a covering set of polygons 2.  calculate aggregate statistics

•  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  ….

Aggregating Data With Apache Spark

Page 18: Geospatial Big Data

18 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  ….

Aggregating Data With Apache Spark

1000! 1010!

1001!

11!

0!1011!

Page 19: Geospatial Big Data

19 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Average Tone Per Country

0

10000

20000

30000

40000

50000

60000

UK

US

CH

GM

AS

JA EI

BE

NO

PO

FR

CA IN

MX

RS

NL

LG IT SZ

SP

nu

mb

er

of

ne

ws s

tories

0

10000

20000

30000

40000

50000

60000

UK

US

CH

GM

AS

JA EI

BE

NO

PO

FR

CA IN

MX

RS

NL

LG IT SZ

SP

nu

mb

er

of

ne

ws s

tories

News stories per country

POST-BREXIT PRE-BREXIT

OVERALL

Page 20: Geospatial Big Data

20 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  We have presented an architecture for Geo Spatial Big Data storage, processing and visualization

•  Completely open-source!

•  Fast and efficient: few minutes to perform the aggregation on Apache Spark with a few machines AWS cluster

Conclusions

Page 21: Geospatial Big Data

Thank you!

Dr. Fabio Petroni

Page 22: Geospatial Big Data

Document Classification: KPMG Public

The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International. Designed by CREATE | CRT057939

The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavour to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation.

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

kpmg.com/uk