Geospatial Big Data Dr. Fabio Petroni
2 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• exponential grow in volume of spatio-temporal data • e.g., total number of foursquare check-ins: ∼8 billion
Motivation
3 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Examples of Geospatial Big Data Analysis
4 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
monitor the evolving sentiment trends over time and over geographies about BREXIT
Case Study
5 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• Scalability - storing, processing and visualizing large scale spatio-temporal data
three dimensions one dimension (latitude, longitude, time) lexicographical ordering of keys in a table
Challenges
B+ tree
6 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• binary string in which each character indicates alternating divisions of the global longitude-latitude rectangle
Solution: Geohashes
0! 1!
7 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Solution: Geohashes
00! 10!
01! 11!
8 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Solution: Geohashes
0000! 0010 !
0001! 0011!
0100! 0110!
0101! 0111!
1000! 1010!
1001! 1011!
1100! 1110!
1101! 1111!
9 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• z-order traversal of the globe via 4-bit geohashes
Solution: Z-order Traversal
0000! 0010 !
0001! 0011!
0100! 0110!
0101! 0111!
1000! 1010!
1001! 1011!
1100! 1110!
1101! 1111!
10 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• a cluster node holds neighboring data points
Locality Aware Index1
2
3
4
11 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
!
KPMG open source pipeline / stack
HDFS!
Accumulo!
!
!
visualization processing storage
!
!
large-scale data analysis
query and share data
12 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• the GDELT Project monitors the world’s broadcast, print and web news from the entire world • GDELT Global Knowledge Graph (GKG)
- hyper-edges → represent news stories - vertices → represent persons, organizations, locations, etc.
Experiments – GDELT Data
Hillary Clinton!
Donald Trump!
h"p://www.bbc.co.uk/….
Washington,.D.C..
New.York.City.
Tone:.?3.7.
Tone:.+5.1.
London.
h"p://www.nyGmes.com/….
e1! e2!
13 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• ∼200000 data points (news stories)
Experiments - Brexit Dataset
• First Location • Date and Time • URL • Average Tone
data point 2 October 2016
London (51.509865, -0.118092) #01111010…
-2.76
14 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
GeoServer / OpenLayers Data Visualization
15 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
GeoServer / OpenLayers Heatmap - GeoMesa Plugin
16 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Shiny / Leaflet - Interactive Data Visualization
17 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
1. project data points on a covering set of polygons 2. calculate aggregate statistics
• 1010… • 0010…. • 0000…. • 1000…. are these points in Australia? • 1011…. • 1001… • 1100… • ….
Aggregating Data With Apache Spark
18 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• 1010… • 0010…. • 0000…. • 1000…. are these points in Australia? • 1011…. • 1001… • 1100… • ….
Aggregating Data With Apache Spark
1000! 1010!
1001!
11!
0!1011!
19 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Average Tone Per Country
0
10000
20000
30000
40000
50000
60000
UK
US
CH
GM
AS
JA EI
BE
NO
PO
FR
CA IN
MX
RS
NL
LG IT SZ
SP
nu
mb
er
of
ne
ws s
tories
0
10000
20000
30000
40000
50000
60000
UK
US
CH
GM
AS
JA EI
BE
NO
PO
FR
CA IN
MX
RS
NL
LG IT SZ
SP
nu
mb
er
of
ne
ws s
tories
News stories per country
POST-BREXIT PRE-BREXIT
OVERALL
20 Document Classification: KPMG Public
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
• We have presented an architecture for Geo Spatial Big Data storage, processing and visualization
• Completely open-source!
• Fast and efficient: few minutes to perform the aggregation on Apache Spark with a few machines AWS cluster
Conclusions
Document Classification: KPMG Public
The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International. Designed by CREATE | CRT057939
The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavour to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation.
© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
kpmg.com/uk