Top Banner
Spaten: a Spatio-Temporal and Textual Big Data Generator Thaleia Dimitra Doudali* Ioannis Konstantinou Nectarios Koziris *
13

Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Sep 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Spaten: a Spatio-Temporal and Textual Big Data Generator

Thaleia Dimitra Doudali* Ioannis Konstantinou

Nectarios Koziris

*

Page 2: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

2

Motivation

1. Geo-Social Networking Graph 2. Spatio-temporal and textual data

Page 3: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Motivation

3

3. Daily routes with check-ins

× millions of daily users = part of Big Geo-Social Data

Page 4: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Big Spatial Data Engine

Motivation

4

New or extended Big Data Engines for Spatial data.

Input dataset

Performance Evaluation

● OpenStreetMap (60 GB - real)● NASA (4.6 TB - real)● SYNTH (128 GB - synthetic)

Easy access to large spatial datasets.

(real or synthetic)

SpatialHadoop

Page 5: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Problem Statement

5

Big Data Engine

New or extended Big Data Engines for Geo-Social data.

Input dataset

Performance Evaluation

Type Real Synthetic

Small ✔ ✔

Large ❌ ✔

Can we create realistic (real source, synthetic combination) Geo-social data

at a large scale, for performance and scalability evaluations?

Page 6: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Our Contributions

● Build Spaten: a Spatio-Temporal and Textual Big Data Generator.

○ configurable, open source.

6

● Show how we can store and query the generated data,

using state of the art NoSQL database systems.

● Successfully create a large

realistic Geo-social dataset.

Page 7: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Overview

7

Spaten1. Social network graph

2. Points of Interest (POIs)

3. Configuration Parameters

Input

Creates daily routes with check-ins of

users to POIsGeo-Social network

Output

Page 8: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Input Data

8

User User

POI● Latitude● Longitude● Name● Address● Review list

Review● Rating● Title● Text

1. Social network graph

2. Points of Interest (POIs)

Page 9: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Data Generation Process - Example

Generates the day of a user who walks nearby his home or hotel and checks into POIs.

9

9am - ⅘ stars - “you should try the french

toast with homemade jam, it’s so tasty!” 11.05am - 5 stars -

“the cold brew was so refreshing!”

0.1 miles3 min

0.8 miles15 min

12.17am - 5 stars - “delicious food and excellent service”

The configuration parameters control:● how many daily routes?● when does the day start and end?● how many check-ins in a day?● how long will a check-in last?● how far can the user walk?

Page 10: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Output Data

10

check-ins

GPS traces

Social network

User User

User

Check-in● POI● Review● Time - Date

User

GPS Trace● Latitude● Longitude● Time - Date

Page 11: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Storage - Queries

11

Database

News Feed: Show all friend check-ins in chronological order.

For a random user:

What are the most favorite places that his friends have visited?

How many times have his friends been to their most favorite place?

Queries

Geo-Social Network

Indexed by “user”

Page 12: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Concurrent Queries

Use Case

12

2 months 9 am - 11 pm

~5 check-ins / day ~2 hours / check-in <0.5 miles between

TripAdvisor restaurants = 13 GB

Twitter Graph = 14 GB

Geo-Social Network14 + 3 = 17 GB~10,000 users

(limited us of Google Maps API)

HBase cluster32 nodes

Spaten

Page 13: Nectarios Koziris Spaten: a Spatio-Temporal and Textual ...tdoudali/docs/spaten-slides.pdf · Build Spaten: a Spatio-Temporal and Textual Big Data Generator. configurable, open source.

Summary

13

Geo-Social network

Code: https://github.com/Thaleia-DimitraDoudali/SpatenDataset: http://research.cslab.ece.ntua.gr/datasets/ikons/Spaten/

SpatenBig Data Engine

Performance Evaluation