Top Banner
DB Group @ UNIMO Dot. Fabio Benedetti Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1 D Day 2015 – Modena Italy LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources Fabio Benedetti Department of Engineering “Enzo Ferrari” University of Modena & Reggio Emilia D-Day 2015 - Modena
15

LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Jul 14, 2015

Download

Science

Fabio Benedetti
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 1

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Fabio Benedetti

Department of Engineering “Enzo Ferrari”

University of Modena & Reggio Emilia

D-Day 2015 - Modena

Page 2: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

3Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3

[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in

Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260}

Page 3: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

4Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4

*Only 570 datasets belong to the LOD cloud,

the remaining datasets do not contain

ingoing/outgoing links to the LOD Cloud.

2009 2014*

Domain Number % Number %

Cross-domain 41 13.95% 41 4.04%

Geographic 31 10.54% 21 2.07%

Government 49 16.67% 183 18.05%

Life sciences 41 13.95% 83 8.19%

Media 25 8.50% 22 2.17%

Publications 87 29.59% 96 9.47%

Social web 0 0.00% 520 51.28%

User-generated content 20 6.80% 48 4.73%

Total 294 1014

2009 Domain

Cross-domain

Geographic

Government

Life sciences

Media

Publications

Social web

2014

Page 4: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

5Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5

The Open Access trends encourage the

publication of Open Data in form of

Linked Data

But

discovering LOD sources of interest is a

complex task for a user

Main issues

• Do not exist any standard to document a Dataset

• The structure of the Dataset can be understood only

manually exploring the Dataset

• The Semantic Web technologies are extremely complex for

unskilled user

Page 5: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

6Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6

• To automatically extract and summarize a schema

(Schema Summary) able to describe a LOD Dataset

• Use the Schema Summary to support the user in the

information extraction task

Online & Automatic extraction• It does not require any additional information by the user

• It works with SPARQL endpoints

– We have to handle the bad performance issues of these Datasets

The Schema Summary has to describe a Dataset• Ontology/Vocabulary (OWL & RDFS constraints)

• Open Data (i.e. generated from existing RDBMS)

Page 6: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

7Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7

Two main modules

• Extraction & Summarization

• Visualization & Querying

LODeX uses a NoSQL

Database as back-end

Input

URLs of SPARQL endpoints

Output

Interactive Schema Summary

LOD Cloud

SPARQL Queries

Schema

Summary

NoSQL

LODeX Post-

processing

Statistical Indexes

LODeX Indexes

Extraction

Query Orchestrator

Schema Summary

Visualizzation

Schema Summary

Basic QueryResults

EndpointURLs

Sgvizler

SPARQL Queries

Page 7: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

8Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8

Statistical Indexes

They are composed by 9 indexes divided in three groups:

• General group

• Intensional group

• Extensional group

The IE process is able to generate the SPARQL queries used to extract the

different indexes.

• Iterative algorithm able to extract the Intensional knowledge

• Pattern Strategy technique

– It is a technique able to produce an higher number of less complex

SPARQL query

The IE process is able to perform online index extraction handling the

performance issues of the SPARQL endpoints

[F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open data sources,” 2014, Linked Data for Information

Extraction (LD4IE) Workshop held at International Semantic Web Conference]

Page 8: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

9Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9

The elements composing the Schema Summary are:

• Classes

• Properties

• Attributes

An algorithm combines

the information

contained in the

Statistical Indexes to

produce and store the

Schema Summary

[F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data sources,” 2014, International

Semantic Web Conference (Posters & Demos)]

Page 9: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

10Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10

Schema Summary

SPARQL compiler

SPARQL query

Basic Query

• The User using the Web Application GUI is

driven to building a Basic Query

• A refinement panel helps the user in refine

the Basic Query

A SPARQL compiler automatically generates

the corresponding SPARQL query

Operator supported by the compiler:• AND

• Optional

• Filter

The query is sent to the SPARQL endpoint

and the results can be visualized in a

tabular, maps or chart view (pie, bar, etc.)

• ORDER BY

• LIMIT

• OFFSET

Page 10: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

11Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11

Page 11: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

12Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12

Try LODeX demo at: http://dbgroup.unimo.it/lodex2

[F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX,” 2014, submitted at The

Semantic Web journal]

Page 12: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

13Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13

Test Nov. 2014

Dataset URLs 559

Reachable datasets 302

SPARQL 1.1 compatible

206

Extraction completed 185

Task Correct Answers

Schema Summary browsing 94% (32/34)

Query generation 88% (60/68)

Online survey with 17 anonymous

users:

• 8 Skilled users

• 9 Unskilled user

The survey is divided in two parts:

• Schema Summary browsing

clarity

• Query generation

Page 13: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

14Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14

• Modify the interface of LODeX according to the

results of the online survey

• Extends the VOID descriptor vocabulary in order

to represent the Statistical Indexes and publish our

data as LOD

– Build an observatory for the LOD cloud

• Define clustering techniques to reduce the size of

the Summary for huge dataset

Page 14: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

15Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15

Accepted papers• Beneventano, D., Bergamaschi, S., Sorrentino, S., Vincini, M., Benedetti, F. “Semantic

annotation of the CEREALAB database by the AGROVOC linked dataset” (2014)

Ecological Informatics journal, . Article in Press.

• F. Benedetti, S. Bergamaschi, and L. Po, “Online index extraction from linked open

data sources” 2014, Linked Data for Information Extraction (LD4IE) Workshop held at

International Semantic Web Conference

• F. Benedetti, S. Bergamaschi, and L. Po, “A visual summary for linked open data

sources” 2014, International Semantic Web Conference (Posters & Demos)

Submitted papers• F. Benedetti, S. Bergamaschi, and L. Po, “Visual Querying LOD sources with LODeX”

2014, submitted at Semantic Web – Interoperability, Usability, Applicability an IOS

Press Journal

European projects & schools• Web Science Summer School - Southampton University (20-26 July 2014)

• RDA Research Data Alliance - RDA Fourth Plenary Meeting 22 - 24 September 2014 in

Amsterdam. I won an Early Career Scientist grant and I belong to the Big Data

Analytics Interest group.

• Keystone - COST Action IC1302. Autumn 2014 MC and WG Meetings “QUERYING THE

SEMANTIC WEB” 17-18 October 2014, Riva del Garda, TN.

Page 15: LODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

DB

Gro

up

@ U

NIM

O

16Dot. Fabio Benedetti

Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia

D Day 2015 – Modena ItalyLODeX: Schema Summarization and automatic SPARQL query generation for Linked Open Data sources

Thanks for your attention!