Top Banner
ST-Toolkit: a Framework for Trajectory Data Warehousing Authors AGILE 2011 Utrecht – 20/04/2011 Simone Campora Jose Fernandes De Macedo Laura Spinsanti
17

ST-Toolkit, a Framework for Trajectory Data Warehousing

Jan 27, 2015

Download

Technology

Simone Campora

Presentation of ST-Toolkit: a Framework for trajectory Data warehousing short paper published in AGILE 2011, Utrecht 20th April 2011.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ST-Toolkit, a Framework for Trajectory Data Warehousing

ST-Toolkit: a Framework for Trajectory Data Warehousing

Authors

AGILE 2011Utrecht – 20/04/2011

Simone CamporaJose Fernandes De MacedoLaura Spinsanti

Page 2: ST-Toolkit, a Framework for Trajectory Data Warehousing

Overview

Agenda

• Introduction• Generic Data Warehouse Schema for Trajectories• Generic Data Warehouse Architecture• Some experiments• Conclusions

Page 3: ST-Toolkit, a Framework for Trajectory Data Warehousing

Why Trajectory Data Warehousing

The motivation behind Trajectory Data Warehouses (TrDWs) is to transform raw moving objects' trajectories to valuable information that can be exploited for decision-making purposes in ubiquitous applications, such as location-based services, traffic control management, etc using an OLAP (or STOLAP) fashion.

Page 4: ST-Toolkit, a Framework for Trajectory Data Warehousing

Why Trajectory Data Warehousing

Problems in Trajectory Management

• Rapid Access to huge archive of Data(e.g. our dataset counts 2Mlns records in one week

only!)

• Knowledge Discovery on Trajectories(extract interesting patterns from Raw Coordinates)

• Knowledge Presentation(how to deliver such information extractions?)

• Semantic Integration(how to integrate semantic support-data?)

Page 5: ST-Toolkit, a Framework for Trajectory Data Warehousing

Our Contribution

Is developed following two main axes:

• Theoretical• by providing a generic TrDW schema propositions that is robust, intuitive and fits the most used cases• by providing a centric and non fragmented overview on the main topics of trajectory data warehousing

•Architectural• by deploying a modular cross-database cross-platform Middleware to support Spatio-Temporal data warehouse modeling

Page 6: ST-Toolkit, a Framework for Trajectory Data Warehousing

Trajectory Extraction

Raw Data Level

Trajectory Level (Semantic Level)

Episodes Level

Application Domain Level

stop

stop

stop

stopstop

move move

movemove

Cleaned Data Level

garage workplace workplacelunch meeting

garage

Page 7: ST-Toolkit, a Framework for Trajectory Data Warehousing

Generic Data Warehouse Schema

•Solution built around “Episodes”

• Independent External Semantics

• Trajectory-based Pre-Grouping

(MutiDimER notation)

Episode

Moving Entity

Avg Speed

Type

Var Speed

Lifetime

Shape

Month

name

Year

name

Trajectories

IDLifetimeAvg Travel TimeAvg SpeedTravel TypeMoving entity

Time

TIME

Day

NameTimestamp

Events

NameCategoryAverage Visit Time

EVENTS

TRAJECTORY GROUP

Trajectory Groups

Main Trajectory Groups

ShapeNumber of trajectoriesNumber of episodes

Region

Shapename

Environment

Shapename

Space

Area

ShapeSurfacename

SPACE

Page 8: ST-Toolkit, a Framework for Trajectory Data Warehousing

Designing a Data Warehouse solution can be tricky because of

Data Warehouse Design Issues

Lack of standard interfacesLack of standard interfaces

every commercial/academic solution is implementing different approaches to istantiate multi every commercial/academic solution is implementing different approaches to istantiate multi dimensional models into Databasesdimensional models into Databases

Lack of standard interfacesLack of standard interfaces

every commercial/academic solution is implementing different approaches to istantiate multi every commercial/academic solution is implementing different approaches to istantiate multi dimensional models into Databasesdimensional models into Databases

• Longer learning curves

• Difficulties while migrating to different architectures (RDBMS,Distributed FS, … )

• Difficulties in replicating the same TDW on the same architecture

Page 9: ST-Toolkit, a Framework for Trajectory Data Warehousing

Generic Data Warehouse Architecture

J2EE Application Server

Remote Objects JDBC Provider MDX Endpoint XML/A Web Service

Generic Data Warehouse Interfacing Middleware

RDBMS Oracle OLAP Mondrian JDBC

Data Loader (ETL)

Data Warehouse Designer

Query Parser and Translator

Generic Object Interface for Data Warehouse Objects

...

Analytical Workspace Analytical Workspace Analytical Workspace

Cube Cube Cube Cube Cube Cube Cube Cube

Mondrian Bridging

Page 10: ST-Toolkit, a Framework for Trajectory Data Warehousing

Example: Data Warehouse Design

Design Module - Mondrian

GPS Dataset

Time Dimension

Space Dimension

Event Dimension

Trajectory Dimension

Events (e.g.POI)

Space Areas

Input Data

Episode Cube

Episode Facts

Data Warehouse

Java Object Translation

List<Event> (LifeSpanPoint)

List<Geometry>

Relational Load Procedures (alternatively Hibernate Persistence)

Event Table Loader

Space Table Loader

Trajectory Extractor (ETL Procedure)

List<Trajectory>

List<Episode>

List<Geometry> Trajectory Table Loader

Episodes Fact Table Loader

Time Table LoaderList<LifeSpan>

Raw Data Level Run Time Memory Level Database Storage Level Multidimensional Mapping Level

First Step

Data are Streamed From Raw

Datasets into Primary Memory

Second Step

Java Objects are Buffered and Istantiated

Asychronously Sent to the

RDBMS

Third Step

Java Objects are Persisted into

RDBM and properly Indexed

Fourth Step

The MultiDimensional Model is istantiated from RDBMS data

sources + DW Metedata Definitions

Page 11: ST-Toolkit, a Framework for Trajectory Data Warehousing

Some Experiments

The Milano Dataset

Our Experiments are aimed to test • SOLAP Queries•STOLAP Queries• “Presence” Custom Specified Measure Validation

Features Value

Records 2075213

Trajectories 83134

Stops 464584

Moves 1527495

POIs 39776

What is the role of semantics in query complexity?

Page 12: ST-Toolkit, a Framework for Trajectory Data Warehousing

STOLAP Query

STOLAP Query With Semantics: Give the number of visits of a moving entity for events of type “Restaurant” where its own trajectory started occur in a range = Ɵ close to a residential area (where residential area is a record of the Event dimension)

STOLAP Query Without Semantics: Give the number of visits of a moving entity occur in a range = Ɵ close to regions where there is a high concentration Ω of trajectories at lunch time (12:00-13:00) or dinner time (19:00-20:00) where its own trajectory started near a residential area ( defines as an area where a number of ᵚ trajectories start)

SpatialFilter filter = new DistanceFilter(eventDimension.getProperty("Event Shape"),stopMeasure,1);

OlapQuery query = new OlapQuery();

query.addSelection(presenceMeasure,OlapQuery.COLUMNS);query.addSelection(eventDimension,OlapQuery.ROWS);query.addFilter(filter);query.addCondition("[Event].[Food Shop]");query.addCondition("[Trajectory].[Trajectory Group].[Number of Trajectories > 10]");query.setCube(stCube);query.execute();

N_VISITS OBJET_ID64640 8975456055 7879652015 7070249995 7693047470 7908846460 82085

Page 13: ST-Toolkit, a Framework for Trajectory Data Warehousing

Presence Measure Validation

Presence Measure: Problem: how to aggregates the number of trajectories within a hierarchical fully-geometric dimension avoiding the double-counting problem ?

A B

C D

Sum

1

1

1

1

1 != 4

Page 14: ST-Toolkit, a Framework for Trajectory Data Warehousing

Presence Measure Validation

Solution :define an aggregation algorithm that can use spatial operators!

Our application can define SQL injections for spatial-aggregates :String sqlExpression = "case when get_trj_space_area_intersections(trdw_episode_facts.geom) > 0 then ceil(1/get_trj_space_area_intersections(trdw_episode_facts.geom)) else 0 end ";

Measure presence = new VirtualMeasure(“Trj Presence Measure", factTable, “presence", sqlExpression);

String sqlExpression = "case when get_trj_space_area_intersections(trdw_episode_facts.geom) > 0 then ceil(1/get_trj_space_area_intersections(trdw_episode_facts.geom)) else 0 end ";

Measure presence = new VirtualMeasure(“Trj Presence Measure", factTable, “presence", sqlExpression);

Page 15: ST-Toolkit, a Framework for Trajectory Data Warehousing

Presence Measure Validation

Results on 260 Trajectories subset Milano – Arese: 2

Milano – Assago: 2Milano – Bollate: 1Milano – Bresso: 2

Milano – Buccinasco: 2Milano - Cesano Boscone: 6

Milano – Cormano: 2Milano – Corsico: 2

Milano - Cusano Milanino: 2Milano – Gaggiano: 2

Milano - Locate di Triulzi: 2Milano – Milano: 186

Milano – Novate: 2Milano – Opera: 2

Milano – Pero: 2Milano - Peschiera Borromeo: 2

Milano – Rho: 14Milano – Rozzano: 2

Milano - San Donato Milanese: 1Milano - San Giuliano Milanese: 6

Milano – Segrate: 2Milano - Settimo Milanese: 8

Milano - Trezzano Rosa: 4Milano - Zibido San Giacomo: 2

Monza and Brianza – Mezzano: 2

Milano: 258 Monza and Brianza: 2

Lombardia: 260

Page 16: ST-Toolkit, a Framework for Trajectory Data Warehousing

Conclusions

Summarizing: we are proposing

• a cross-database cross-platform generic middleware for spatio-temporal DW

• a modular architecture that can be enriched with user-defined aggregation functions

• a proposal for independent integration of Semantics for Trajectories

• the first (known) implementation of a Semantic enriched Trajectory Data Warehouse

Page 17: ST-Toolkit, a Framework for Trajectory Data Warehousing

Thanks for your attention

Any Question?Any Question? Suggestions?Suggestions?

Comments?Comments?

For more information: http://st-toolkit.sourceforge.net/

Thanks for the attention