Top Banner
Andriy Nikolov, fluid Operations AG, Germany 2 nd OpenCube Webinar 15 September 2015 The OpenCube Toolkit: Overview
38

The OpenCube Toolkit - webinar 2

Jan 24, 2018

Download

Data & Analytics

OpenCubeProject
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The OpenCube Toolkit - webinar 2

Andriy Nikolov, fluid Operations AG, Germany

2nd OpenCube Webinar

15 September 2015

The OpenCube Toolkit:Overview

Page 2: The OpenCube Toolkit - webinar 2

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

2

Table of Contents

2nd OpenCube Webinar

Page 3: The OpenCube Toolkit - webinar 2

Data Cube

Statistical data is often organized as data cubes, where each cell contains a measure described based on a number of dimensions.

OLAP Operations: drill up/down, slicing, dicing, pivot etc. Data cubes essential for Business Intelligence

Dimensions Hierarchy

Measure

2nd OpenCube Webinar 3

Page 4: The OpenCube Toolkit - webinar 2

Linked Data has the potential to enable combining and performinganalytics on top of disparate and previously isolated statistical data

The RDF Data Cube Vocabulary has been proposed for modellingmulti-dimensional data as RDF graphs.

However, tools for handling linked data cubes:

are only few and scattered

have not been tested under real-life conditions

4

Linked Data

Potential of using LOD in statistical data analysis unexploited

2nd OpenCube Webinar

Page 5: The OpenCube Toolkit - webinar 2

5

The OpenCube project

OpenCube is a 2-year project funded by the EU within FP7

The project aims to develop and test processes and tools for managing statistical

linked open data.

The results will:

Facilitate data publishers to create linked data cubes from legacy formats

Empower data users to browse, visualise, link, expand and analyse data cubes.

Enable analysis not possible before (merging data cubes at a Web scale)

2nd OpenCube Webinar

Page 6: The OpenCube Toolkit - webinar 2

We propose a lifecycle for statistical LD

The lifecycle is divided into three phases: create, expand and exploit (or consume)

The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.

OpenCube also develops tools to support the whole lifecycle of linked statistical data.

Linked Statistical Data Lifecycle

6

E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.

* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.

2nd OpenCube Webinar

Page 7: The OpenCube Toolkit - webinar 2

For more information http://opencube-project.eu http://opencube-toolkit.eu

Check out our free webinars!! 1st webinar: Project overview & OLAP

browser: Slides:

http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Video: https://vimeo.com/138860345

Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]

7

More on OpenCube…

OpenCube consortium

2nd OpenCube Webinar

Page 8: The OpenCube Toolkit - webinar 2

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

8

Table of Contents

2nd OpenCube Webinar

Page 9: The OpenCube Toolkit - webinar 2

Creating components TARQL extension

D2RQ extension

JSON-stat

Grafter

R2RML-cube extension

(commercial offering only)

Expanding components OpenCube Expander

OpenCube Linker

Exploiting components Data catalogue solution

OpenCube Browser

OpenCube MapView

R Analytics Integration

9

OpenCube Toolkit

Developed using the open source Information Workbenchas underlying linked data management platform

License scheme OpenCube components are

provided under open source licenses

Check http://opencube-toolkit.eu

But, commercial solutions are also offered by consortium members

2nd OpenCube Webinar

Page 10: The OpenCube Toolkit - webinar 2

2nd OpenCube Webinar 10

Base platform: Information Workbench

Platform for development of linked data applications

Semantic Web Data

Semantics- & Linked Data-based

Integration of Enterprise and Open

Data Sources

Intelligent Data Access and

Analytics

• Visual exploration

• Semantic search

• Dashboarding and reporting

Collaboration and Knowledge

Management Platform

• Wiki-based curation &

authoring of data

• Collaborative workflows

Source: http://www.fluidops.com/information-workbench/

Page 11: The OpenCube Toolkit - webinar 2

2nd OpenCube Webinar 11

Platform Architecture

Data storage and management platform

Reusable UI and data integration components

Customized application solutions

External resources to reuse data and create mashups

Page 12: The OpenCube Toolkit - webinar 2

Template: …

Ontology as a “Structural Backbone”

Resource page

RDF DataGraph

Ontology(RDFS/OWL)

#BarackObama#WhiteHouse

foaf:Person

vcard:Address

rdf:typerdf:type

Template:vcard:Address

UI templates

Template:foaf:Person

Resource page

Defining data

structure

Defining UI structure

2nd OpenCube Webinar 12

Page 13: The OpenCube Toolkit - webinar 2

• Open Source, written in Java

• Layered architecture for semantic data management

• Easy to plug in new data management components on demand

• Most of the existing triple stores support Sesame API

Sesame Access API

SAIL API

Stable (yet extensilble) APIs for data access, manipulation, ...

SAIL 1 (e.g. Query Optimization

Layer)

SAIL 2 (e.g. Distributed Query

Execution Layer)

DB1 DB2 DB3

Stackable architecure of custom data management components

Easy integration by implementing a generic API

Data Storage & Access

Data Management based on Sesame framework

2nd OpenCube Webinar 13

Page 14: The OpenCube Toolkit - webinar 2

Data Integration: Data Provider Concept

Data providers support the periodic extraction & integration from external data sources into a central repository

• Lifting from arbitrary data formats to RDF (e.g., relational, XML, CSV)

• Parametrizable (e.g. connection information, refresh interval, ..)

• Built-in UI for instantiating providers

• Intuitive interfaces and APIs for writing own, custom providers

Connect to data source

Convert data into RDF

Extract data from source

ScriptProvider

SOAP ProviderR2RML

XML2RDF

REST Provider

Examples:

Store RDF in repository

2nd OpenCube Webinar 14

Page 15: The OpenCube Toolkit - webinar 2

Data source concept

2nd OpenCube Webinar 15

Data integration

Data Source

• Low-level data access

Mapper

• Translation into triples

•Extract and manipulate data

Post Processor

(optional)

•Reconciliation (merging)

• Improve data quality

Page 16: The OpenCube Toolkit - webinar 2

2nd OpenCube Webinar 16

User Interface

Page 17: The OpenCube Toolkit - webinar 2

User Interface: One Page per URI

Resource page

RDF

Graph

Resource page

Resource page

Resource page

172nd OpenCube Webinar

Page 18: The OpenCube Toolkit - webinar 2

Wiki Concept

• Resource view is defined using the wiki-based UI

• Go to a new wiki page…/resource/Widget123Page

• Change to the Edit View

182nd OpenCube Webinar

Page 19: The OpenCube Toolkit - webinar 2

Analytics and ReportingVisualization and Exploration

Mashups with Social MediaAuthoring and Content Creation

Widgets are not static and can be integrated into the UI using a

Wiki-style syntax.

Configurable Widgets

2nd OpenCube Webinar 19

Page 20: The OpenCube Toolkit - webinar 2

Page content is composed based on a template concept:

Barack Obama

rdf:type

• Wiki template Template:foaf:Person• Table view config for foaf:Person• Graph view config for foaf:Person• Pivot view config for foaf:Person• Additional widget definitions for foaf:Personrequest for

dbpedia:Barack_Obama

foaf:Person

Resource page

• Wiki view for dbpedia:Barack_Obama• Table view for dbpedia:Barack_Obama• Graph view for dbpedia:Barack_Obama• Pivot view for dbpedia:Barack_Obama• Additional widget definitions for

dbpedia:Barack_Obama

+

Combined information from template definition and specific instance (giving instance config a priority)

Instance Pages vs. Templates

2nd OpenCube Webinar 20

Page 21: The OpenCube Toolkit - webinar 2

Download open-source Information Worbench Community Edition

http://www.fluidops.com/en/company/training/open_source

Detailed documentation

http://help.fluidops.com

2nd OpenCube Webinar 21

More information

Page 22: The OpenCube Toolkit - webinar 2

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data Creating linked data cubes

Exploiting statistical data

Conclusions

22

Table of Contents

2nd OpenCube Webinar

Page 23: The OpenCube Toolkit - webinar 2

We propose a lifecycle for statistical LD

The lifecycle is divided into three phases: create, expand and exploit (or consume)

The lifecycle prescribes the steps that raw data cubes* should go through in order to create value.

OpenCube also develops tools to support the whole lifecycle of linked statistical data.

Linked Statistical Data Lifecycle

23

E. Tambouris, E. Kalampokis, K. Tarabanis (2015) Processing Linked Open Data Cubes, Electronic GovernmentVolume 9248 of the series Lecture Notes in Computer Science pp 130-143.

* We assume statistical data is organized as data cubes, where each cellcontains a measure described based on a number of dimensions.

2nd OpenCube Webinar

Page 24: The OpenCube Toolkit - webinar 2

24

Data Creation Components

2nd OpenCube Webinar

Page 25: The OpenCube Toolkit - webinar 2

Implemented as custom data providers

2nd OpenCube Webinar 25

Data Creation Components

Page 26: The OpenCube Toolkit - webinar 2

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data Creating linked data cubes

Exploiting statistical data

Conclusions

26

Table of Contents

2nd OpenCube Webinar

Page 27: The OpenCube Toolkit - webinar 2

Managing metadata catalogues

Allows the user to search for specific datasets by keyword/category/catalogue

explore pre-defined relations between datasets within the catalogue

explore the available metadata descriptions of datasets (dataset structure)

Data Catalogue Management Solution

272nd OpenCube Webinar

Page 28: The OpenCube Toolkit - webinar 2

28

Exploring data: OpenCube browserSummarize observations

across a dimension

(dimension reduction)

Change the axes

of the table

Change the

language

Change the fixed

values

It enables the exploration of an RDF data cube by presenting a two-dimensional slice of the cube as a table.

The slice is created by setting a fixed valuesfor each dimensionthat is not presented in the table.

2nd OpenCube Webinar

See our first webinar: http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Page 29: The OpenCube Toolkit - webinar 2

Visualizes RDF data cubes on a map

Allows selecting the cube, dimensions, and measuresto display in an interactiveway

Supports: Markers

Bubble

Choropleth maps

29

Exploring data: OpenCube MapView

2nd OpenCube Webinar

Page 30: The OpenCube Toolkit - webinar 2

Enables advanced data analysis tasks using the well-established R software

2nd OpenCube Webinar 30

Analyzing data with R

Passing input data retrieved from an RDF triple store

using SPARQL

Reusing the analysis results for visualization or

integration with the original data

Page 31: The OpenCube Toolkit - webinar 2

2nd OpenCube Webinar 31

R Analysis Tasks

Analysis task is editedusing a web UI form

2 types of inputparameters: Constants

interpreted as variables ofbasic types in R

SPARQL query results interpeted as data frames

in R

Script executed on the R server, and the resultsare passed back to theOpenCube Toolkit

Page 32: The OpenCube Toolkit - webinar 2

Making use of the results Visualize

Store as linked data

Visualisation of analysis results as a table

as a static chart built in R

as an interactive stock chart

Reuse of analysis results: preserving R output aslinked data Use R output as a tabular data source to import data and

convert with R2RML

32

Analyzing data with R

2nd OpenCube Webinar

Page 33: The OpenCube Toolkit - webinar 2

OpenCube public demo

An instance of the developed platform hosted by fluidOps.

Contains metadata and a set of cubes from Eurostat.

Illustrates the data catalogue functionalities and data analysis using R.

http://data.fluidops.net

The Flemish Government An instance of the developed

platform have been deployed at the premises of the Flemish government.

Flemish government had already opened up statistics by means of linked data cubes.

11 cubes had been transformed to linked data according to the QB vocabulary and stored in a Virtuoso RDF store.

Demos

332nd OpenCube Webinar

Page 34: The OpenCube Toolkit - webinar 2

The OpenCube Project: Overview

The OpenCube Toolkit

Base platform

Components for processing statistical data

Conclusions

34

Table of Contents

2nd OpenCube Webinar

Page 35: The OpenCube Toolkit - webinar 2

OpenCube project develops processes and tools for statistical data management

OpenCube Toolkit provides:

A platform for building customized applications with linked datacubes

A range of software components for: Tools for creating linked open statistical data

Tools for expanding open statistical data

Tools for exploiting linked open statistical data

35

Conclusions

2nd OpenCube Webinar

Page 36: The OpenCube Toolkit - webinar 2

For more information http://opencube-project.eu http://opencube-toolkit.eu

Check out our free webinars!! 1st webinar: Project overview & OLAP

browser: Slides:

http://www.slideshare.net/OpenCubeProject/opencube-project-webinar-1-sept-8-2015

Video: https://vimeo.com/138860345

Project coordinators: Konstantinos Tarabanis, [email protected] Themis Tambouris, [email protected]

36

More on OpenCube…

OpenCube consortium

2nd OpenCube Webinar

Page 37: The OpenCube Toolkit - webinar 2

The work presented in the paper is partially funded by

37

Acknowledgments

http://opencube-project.eu

@OpenCubeProject

2nd OpenCube Webinar

Page 38: The OpenCube Toolkit - webinar 2

PublishMyData for publishing governmental statistical data

Tuesday, September 22 at 06:00 PM CEST

http://opencube.enterthemeeting.com/m/VCAJFCJW

38

Next webinar

2nd OpenCube Webinar