Top Banner
WWW 2017 Tutorial: Semantic Data Management in Practice Part 8: Integrating Olaf Hartig Linköping University [email protected] @olafhartig Olivier Curé University of Paris-Est Marne la Vallée [email protected] @oliviercure
24

Semantic Data Management in Practice Part 8: Integrating

Feb 22, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semantic Data Management in Practice Part 8: Integrating

WWW 2017 Tutorial:Semantic Data Management in Practice

Part 8: Integrating

Olaf HartigLinköping University

[email protected]

@olafhartig

Olivier CuréUniversity of Paris-Est Marne la Vallée

[email protected]

@oliviercure

Page 2: Semantic Data Management in Practice Part 8: Integrating

2WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Goals

● Achieve global understanding of semantic integration:– What are the main problems?– Which approaches are in use?

● Understand the main features of some commercial and open source systems

Page 3: Semantic Data Management in Practice Part 8: Integrating

3WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Overview

● (semantic) integration● Two approaches● Systems● Demo

Page 4: Semantic Data Management in Practice Part 8: Integrating

4WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Data integration

● is a core component of information technology

● enables the combination of data contained in multiple data sources

● has to deal with– discovering and representing mapping assertions

between source schemata, e.g., database tables

– answering queries using multiple data sources, e.g., using SQL

Page 5: Semantic Data Management in Practice Part 8: Integrating

5WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Semantic Integration ?

● Bringing together diverse, possibly heterogenous, sources of information and interrelating them by leveraging the semantic information that is embedded inside them

● Interrelation occur at the ontology/vocabulary level– Recall that ontologies aim for knowledge sharing

● For example, integrate data across DBpedia, Wikidata, or any other Linked Data sources

Page 6: Semantic Data Management in Practice Part 8: Integrating

6WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Why it is Important ?

● Combine data and knowledge from multiple sources to:

– answer queries using multiple sources of data, e.g., in SPARQL

– support interoperability between different systems and thus sharing knowledge

– reason over multiple data sources using knowledge sources

Page 7: Semantic Data Management in Practice Part 8: Integrating

7WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Mapping Ontologies is a Hard Problem

● Too many large ontologies to consider manual mappings

● Semantic integration has to deal with different levels of mismatches:

– Ontology: syntax, expressiveness

– Linguistic: terms used in ontology

– Modeling: conventions, granularity

– Domain: coverage

Page 8: Semantic Data Management in Practice Part 8: Integrating

8WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Overview

● (semantic) integration● Two approaches● Systems● Demo

Page 9: Semantic Data Management in Practice Part 8: Integrating

9WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Two Main Approaches

● Existence of a shared ontology which is extended to relate external ontologies via some mappings

● No shared ontology is available:– Heuristics-based or machine learning techniques

are used to relate ontologies

Page 10: Semantic Data Management in Practice Part 8: Integrating

10WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Shared Ontology

● Several types of ontologies: – Top-level ontologies formalize general notions

(e.g., processes, events, time, space, physical objects, etc.). An example is DOLCE which aims at capturing “ontological categories underlying naturel language and human common-sense”

– Domain ontologies describe a specific domain in terms of concepts and properties

– Application ontologies specify terms for a given application. They depend on a domain ontology.

Page 11: Semantic Data Management in Practice Part 8: Integrating

11WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Shared Ontology (2)

● Top-level ontologies are designed to support ontology matching– If two ontologies extend the same top-level

ontology then it is easier to find correspondences between them. The top-level ontology serves as a bridge.

Page 12: Semantic Data Management in Practice Part 8: Integrating

12WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Heuristics and ML approaches

● Heuristic approaches are usually based one or a combination of structure, element or instance analysis. Examples are the PROMPT Suite (developed by the Protégé team)

● Machine learning approaches can combine different learners using a probabilistic model to discover correspondences between ontologies. Examples are GLUE, FCAMerge

Page 13: Semantic Data Management in Practice Part 8: Integrating

13WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Overview

● (semantic) integration● Two approaches● Systems● Demo

Page 14: Semantic Data Management in Practice Part 8: Integrating

14WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Systems

● Commercial products– Semaflora systems– TopQuadrant’s TopBraid– Cambridge Semantics

● Free, open-source solutions– RDF Refine– Silk– Limes– Karma

Page 15: Semantic Data Management in Practice Part 8: Integrating

15WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Commercial products

● Ontoprise was one of the early software suite created with Semantic Web standards in view.– Some products, including semantic integration tools

were acquired in 2012 by Semaflora systems

● Since the early 2000s, TopQuadrant proposes a Semantic ecosystem around the TopBraid suite. It is composed of an advanced ontology editor, TopBraid Live to integrate data

● Cambridge Semantics was founded in 2007. It proposes the Anzo suite which supports integration, cleaning, management of metadata

Page 16: Semantic Data Management in Practice Part 8: Integrating

16WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

LIMES

● LInk discovery framework for MEtrix Spaces● Academic project with software maintenance● Composed of several discovery approaches

– Link discovery for approximation of similarity between instances

– Machine learning (supervised and unsupervised)● Easily configurable through files or a Graphical User

Interface

Page 17: Semantic Data Management in Practice Part 8: Integrating

17WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

SILK

● A Linked data integration framework● With commercial support by the Eccenca start-up ● Active since 2010, latest version is 2.7.1● Main features

– Generate links between related data items– Apply data transformations to structured data

sources (e.g., generate RDF triples from csv files)– Link RDF triples to data sources on the Web (e.g.,

LOD)

Page 18: Semantic Data Management in Practice Part 8: Integrating

18WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

KARMA

● An information integration tool● Developed and maintained at University of South

California (USC)● Karma learns to recognize mapping of data to

ontologies● Provides a Graphical User Interface to interact with

data sets and ontologies

Page 19: Semantic Data Management in Practice Part 8: Integrating

19WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Overview

● (semantic) integration● Two approaches● Systems● Demo

Page 20: Semantic Data Management in Practice Part 8: Integrating

20WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Demo

● Using Karma– Website: http://usc-isi-i2.github.io/karma/– Download: https://github.com/usc-isi-i2/Web-Karma– Unzip web-karma-master.zip– Go to Web-Karma-master folder and run mvn clean

install– To run karma, go to karma-web folder and run mvn

-Djetty.port=8086 jetty:run– http://localhost:8086

Page 21: Semantic Data Management in Practice Part 8: Integrating

21WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Demo

● Scenario: integrate data obtained from a french organization (CSV file)– Create an ontology on the fly– Annotate data with this ontology– Annotate other data elements with the igeo

ontology– Generate an RDF document

Page 22: Semantic Data Management in Practice Part 8: Integrating

22WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Page 23: Semantic Data Management in Practice Part 8: Integrating

23WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Wrap-up

● Mature tools in the Semantic data management ecosystem

● High quality open source systems are available

● Large companies are already present or entering the market

Page 24: Semantic Data Management in Practice Part 8: Integrating

24WWW 2017 Tutorial: Semantic Data Management in Practice Part 8 – IntegratingOlaf Hartig and Olivier Curé

Wrap-up (2)

● Still many open issues to address– Efficient partitioning and RDF storage– Reasoning and high performance query answering– SPARQL query processing and analytics– Understanding, visualizing very large graphs– Data cleansing– ...