Top Banner
INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATION CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG, ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.
20

INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

Aug 20, 2018

Download

Documents

hatu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATION

CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG,

ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.

Page 2: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EXPLORING DATABASES CAN BE TEDIOUS…

DBLP CMT EASYCHAIR

Author of paper with

title ‘IncMap’?

SQL 2 SQL 1 SQL 3

Schema 1 Schema 2 Schema 3

Page 3: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

PROBLEM 1: TOO MANY TABLES

Author of paper with

title ‘IncMap’?

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

Id Name …

A typical SAP schema has more than 10.000 tables

Page 4: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

PROBLEM 2: LIMITED EXPRESSIVENESS

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Modeling generalization is “messy”

Page 5: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

PROBLEM 3: TECHNICAL DESIGN

BDC_IXN_FACT_MA

BDC_ACCOUNT_DIM

BDC_DEMOGRAPHICS_DIM BDC_IXN_FACT_WA

Other issues: •  De-normalization (i.e., merge tables) •  No foreign keys! •  Performance optimizations (horizontal, vertical

fragmentation, …)

Page 6: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

ONTOLOGY-BASED DATA ACCESS

DBLP CMT EASYCHAIR

ONTOLOGY-BASED DATA ACCESS

SQL 2 SQL 1 SQL 3

HIGH-LEVEL QUERY

Author of paper with

title ‘IncMap’?

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Minimal Ontology (in OWL QL)

Page 7: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

ONTOLOGY-BASED DATA ACCESS

Relational Schema

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2) Mapping?

Ontology

IncMap: A Mapping Tool for Relational-To-Ontology Data Integration

Page 8: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

THE JOURNEY OF INCMAP

First version of IncMap

•  Incremental mapping

•  Leverage lexicographical and structural similarity

Christoph Pinkel, et al.: Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap. International Semantic Web Conference 2013

Page 9: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

THE JOURNEY OF INCMAP

First version of IncMap

•  Incremental mapping

•  Leverage lexicographical and structural similarity

Second version of IncMap

•  Consider typical design patterns

•  Leverage reasoning (open vs. closed-world)

•  Bootstrap mappings (fully automatic)

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Andriy Nikolov, Andreas Schwarte, Christian Heupel, Tim Kraska: IncMap: A Journey towards Ontology-based Data Integration. BTW 2017

Page 10: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

STEP 1: MAPPING TO INCGRAPHS

Person'

ID'

...'

Paper'

?tle'

PersID'(FK)'

...'

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

Author'domain'

writes' Paper'range'

Class'

Object'Property'

type'

Datatype'Property'

hasTitle'domain'

type'

type'

subClassOf'

Person'

type'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Relational Schema R Ontology O

IncGraph(R) IncGraph(O)

Main Reason: Mitigate structural differences

Page 11: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

IncGraph(R)

STEP 2: REASONING AND PATTERNS

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

mul?Etype'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Pattern: Inheritance Reasoning

Person

Author Reviewer

name domain

sub-class

area

domain

e-mail

domain aid name e-mail 1 Lennon a@b

rid name area 1 Harrison Onto

pid e-mail 1 a@b

pid area 2 Onto

pid name 1 Lennon 2 Harrison

pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer

Ontology

Author Reviewer

Person Author Reviewer

Person

Relational Schema (Option 1)

Relational Schema (Option 3)

Relational Schema (Option 2)

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

IncGraph+(R) IncGraph+(O)

IncGraph(O)

Page 12: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

REASONING: TWO OPTIONS

Option 1: Full reasoning

1.  Reasoning on the base ontology using OWL QL

2.  Add all derivable elements to IncGraph(O)

Option 2: Custom reasoning (to close “modeling gaps”)

1.  Reasoning on the IncGraph(O)

•  Generalization hierarchies •  Additional domain and range information •  …

2.  Add selected elements to IncGraph(O) set weights (see next slides)

Page 13: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

STEP 3: PAIRWISE MATCHING

Author'ref'

writes' Paper'ref'

val'

…'

Person'ref'

PersID' Paper'ref'

val' val'…'

Target'

Source'

…'

Possible'Matches'

Author'ref'

writes' Paper'ref'Person' PersID' Paper'

Author'ref'

writes' Paper'ref'Paper' PersID' Person'

Paper'ref'

writes' Author'ref'Person' PersID' Paper'

1.0$0.1$0.2$

0.1$

0.1$0.5$

0.2$ 0.5$

0.2$

Person'ref'

PersID' Paper'ref'

?tle'val'

PersID'ID'

val' val'

varchar'type'

mul?Etype'

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Pairwise Connectivity Graph

Page 14: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

STEP 4: FIXPOINT COMPUTATION

•  Human Input (Acceptance and Rejection of Mappings)

•  Weights for Patterns (Probability of Pattern)

•  Deactivation of Edges (based on Patterns)

Author'ref'

writes' Paper'ref'Person' PersID' Paper'

Author'ref'

writes' Paper'ref'Paper' PersID' Person'

Paper'ref'

writes' Author'ref'Person' PersID' Paper'

1.0$0.1$0.2$

0.1$

0.1$0.5$

0.2$ 0.5$

0.2$

Pairwise Connectivity Graph

Fixpoint Computation (Ext. Similarity Flooding)

0.7 0.5 0.9

0.3 0.3 0.3

Sub-class

0.9 1.0 1.0 1.0

Author'ref'

writes' Paper'ref'

hasTitle'val'

Person' string'type'

subClassOf'

Page 15: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EVALUATION: RODI BENCHMARK

Conferenceontology1

TargetOntologies(Schema)

Oil&gasontology

SourceDatabases

(Schema+Data)

CMTVariant

CMTCanon. … Conf.

VariantConf.Canon. … Single,large

real-worldschema

MappingRules? MappingRules? MappingRules?

Conferenceontology2

Mond.Variant

Mond.Rel. …

MappingRules?

Geodataontology

Variants:

1. Adjusted Naming

2. Structural Adjustments (e.g., hierarchies)

3. Removed foreign keys

4. Merging / Splitting of tables

5. Combined cases

SIGKDD Conference CMT

Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Dominique Ritze, Martin G. Skjæveland, Alessandro Solimando, Evgeny Kharlamov: RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration. ESWC 2015

Real-World

https://github.com/chrpin/rodi

Page 16: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EVALUATION: RODI BENCHMARK

Evaluation queries:

•  Queries simulate information need

•  Can be additional input for mapping

•  56 queries from simple to complex

Metric: per-query F-measure

Page 17: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EVALUATION: COMPETITORS

Relational-to-Ontology Mapping Systems

•  Ontop: http://ontop.inf.unibz.it (Free University of Bozen-Bolzano)

•  Bootox: https://www.cs.ox.ac.uk/isg/tools/BootOX/ (University of Oxford)

General Mapping Systems (Baseline)

•  COMA++: http://dbs.uni-leipzig.de/de/Research/coma.html (University of Leipzig)

Page 18: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EVALUATION: RESULTS

Page 19: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

EVALUATION: RESULTS

Page 20: INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATIONbtw2017.informatik.uni-stuttgart.de/slidesandpapers/F3-10-15/... · incmap: a journey towards ontology-based data integration

CONCLUSIONS

•  Incremental Mapping Generation for Relational-to-Ontology Mappings

•  Most benefits from domain knowledge (patterns, reasoning)

•  Integrated into real-world platform at fluidOps

•  Possible future directions: Patterns, other graph similarity metrics, …