Top Banner
Ontology materialization from relational database sources using D2RQ Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute
20

Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Dec 26, 2015

Download

Documents

Nancy Hart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Ontology materialization from relational database

sources using D2RQ

Rajashree DekaTetherless World Constellation

Rensselaer Polytechnic Institute

Page 2: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

The majority of data underpinning the Web are stored in Relational Databases (RDB).

Advantages: Secure and scalable architecture. Efficient storage. Reliability.

Disadvantages: Difficult to share data across large organizations

where different database schemata are used. Most importantly, there is no check on

semantics.

RDBMS

Page 3: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Semantic web getting more mature, growing need for RDF applications to access content of legacy databases.

Compared to RDB, RDF is: More expressive. More easily processed and interpreted. Easily reasoned over by software agents.

Need a way to make data in RDBMS available as RDF.

RDBMS to RDF

Page 4: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

In order to generate Semantic Web content from a RDB, Tim Berners-Lee proposed a very direct mapping:

Each table in the RDB is a RDF class. Each field (column) name is a RDF property. Each record is a RDF node - an instance of the RDF

class and so can play the role of a subject or an object in a RDF statement.

Mapping data from RDBMS to RDF

Page 5: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Semi-automatic generation of ontology from RDB Read all records, export as RDF triples. Mappings are direct, complex mappings do not usually

appear. Need to convert to RDF regularly. Does not allow the population of an existing

ontology – a BIG limitation!

Map existing RDB to an existing ontology Customize mapping according to existing ontology. Complex mappings can be implemented.

Two Approaches

Page 6: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Provides an integrated environment for accessing the content of non-RDF, relational databases as virtual, read-only RDF graphs.

Using D2RQ we can: Query a non-RDF database using SPARQL queries. Access information in a non-RDF database using the

Jena API or the Sesame API. Access the content of the database as Linked Data

over the Web.

The D2RQ platform

Page 7: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

D2RQ mapping language – describes the relation between ontology and RDB

D2RQ engine – uses mappings to rewrite Jena and Sesame API calls to SQL queries.

D2R server - provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.

The D2RQ platform

Page 8: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

D2R server

Page 9: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

D2RQ mapping language formally defined byhttp://www4.wiwiss.fu-berlin.de/bizer/d2rq/0.1/ D2RQ namespace is defined by http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1# Database compatibility:

Oracle MySQL PostgreSQL Microsoft SQL Server ODBC data sources (e.g. Microsoft Access) - mapping

generator and automatic detection of column types do not work.

More about D2RQ

Page 10: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Two command line tools (only on Windows and Unix systems ):

Mapping generator: Analyzes database schema. Generates a default mapping file. Resultant D2RQ map is an RDF document in N3 format. Mapping can be used as-is or can be customized.

Dump script: Writes the content of the RDB into a single RDF file. Supported syntaxes are "RDF/XML" (the default),

"RDF/XML-ABBREV", "N3", "N-TRIPLE".

Command line tools

Page 11: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Ontology is mapped to a database schema using:

d2rq:ClassMaps – Represents a class or a group of similar classes in the ontology. Specifies how instances of the class are identified.

d2rq:PropertyBridges – A ClassMap has a set of PropertyBridges which specify how the properties of an instance are created.

D2RQ mapping – how it works

Page 12: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

BCODMO ontology materialization from MySQL database using D2RQ

Page 13: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

BCODMO - D2RQ map

Page 14: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

# Table dataset (default mapping) map:dataset a d2rq:ClassMap;

d2rq:dataStorage map:database;d2rq:uriPattern "dataset/@@dataset.dataset_id@@";d2rq:class vocab:dataset;d2rq:classDefinitionLabel "dataset";.

map:dataset__label a d2rq:PropertyBridge;d2rq:belongsToClassMap map:dataset;d2rq:property rdfs:label;d2rq:pattern "dataset #@@dataset.dataset_id@@";.

map:dataset_dataset_id a d2rq:PropertyBridge;d2rq:belongsToClassMap map:dataset;d2rq:property vocab:dataset_dataset_id;d2rq:propertyDefinitionLabel "dataset dataset_id";d2rq:column "dataset.dataset_id";d2rq:datatype xsd:int;

# Table dataset (customized mapping)map:dataset a d2rq:ClassMap;

d2rq:dataStorage map:database;d2rq:uriPattern "http://escience.rpi.edu/ontology/BCO-DMO/bcodmo/2/0/DeploymentDatasetCollection_@@dataset.dataset_id@@";d2rq:class bcodmo:DeploymentDatasetCollection;d2rq:classDefinitionLabel "DeploymentDatasetCollection";.

map:seeAlsoStatement a d2rq:PropertyBridge;d2rq:belongsToClassMap map:dataset;

d2rq:property rdfs:seeAlso;d2rq:uriPattern "http://osprey.bcodmo.org/dataset.cfm?id=@@dataset.dataset_id@@&flag=view";.

map:hasIdentifier a d2rq:PropertyBridge;d2rq:property bcodmo:hasIdentifier;d2rq:belongsToClassMap map:dataset;d2rq:column "dataset.dataset_id";d2rq:datatype xsd:int;.

map:dataset_dataset_id a d2rq:PropertyBridge; d2rq:belongsToClassMap map:dataset; d2rq:property bcodmo:hasParameter;

d2rq:refersToClassMap map:parameters; d2rq:propertyDefinitionLabel "dataset dataset_id"; d2rq:join "dataset.dataset_id =

dataset_parameters.dataset_id"; d2rq:join "dataset_parameters.parameters_id =

parameters.parameters_id";.

Excerpt of the mapping file

Page 15: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Customization is very direct in the case where a class in the ontology is represented by a table in the database.

Mapping is complicated or sometimes not possible when a class in the ontology is not a table in the database, but a record in a database table.

Customization of mapping

Page 16: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Define primary keys wherever possible and create indexes.

Indicate directions in d2rq:joins. Set d2rq:autoReloadMapping to false

whenever not needed. Use hint properties:

d2rq:valueMaxLength d2rq:valueRegex d2rq:valueContains

Optimizing D2R’sperformance

Page 17: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Performs reasonably well with basic triple patterns, performance deteriorates when SPARQL features such as OPTIONAL, FILTER and LIMIT are used.

Does not have reasoning capability. Reasoning can be added by using the D2RQ engine within Jena.

Integration of multiple databases or other data sources using D2RQ alone is not possible.

Read-only, cannot perform INSERT, DELETE or UPDATE operations.

Cannot handle complicated database structures like VIEWS.

Limitations

Page 18: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Virtuoso RDF View: Uses table to class and column to predicate

approach. RDB data are represented as virtual RDF

graphs. Customization of mapping possible.

Triplify: Maps HTTP-URI requests to relational database

queries expressed in SQL. No SPARQL support.

Other tools/applications for publishing databases on Semantic Web

Page 19: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

R2O: XML based declarative mapping language.

DartGrid Semantic Web toolkit: Provides a visual tool to define mapping.

RDBToOnto User oriented tool that creates static mapping

(RDF dump).Asio Semantic Bridge for Relational Databases

(SBDR) and Automapper: Uses table to class approach.

Tools/Applications continued…

Page 20: Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.

Prof. Peter Fox Patrick West Eric Rozell Ankesh Khandelwal Evan Patton

A note of thanks to…