Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M

Post on 14-Jan-2017

166 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

Transcript

Executing SPARQL queries over Mapped DocumentStores with SparqlMap-M

J. Unbehauen M. Martin

IIS // AKSW // BIS // IfILeipzig University

SEMANTiCS 2016

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 1 / 25

Outline

1 Motivation and Scope

2 Approach

3 Evaluation

4 Conclusions and Future Work

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 2 / 25

Scoping

[1] S. Auer, J. Lehmann, A. Ngonga Ngomo. Introduction to Linked Data and ItsLifecycle on the Web, Reasoning Web. Semantic Technologies for the Web of

Data, LNCS 6848, 2011

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 3 / 25

Motivation

NoSQL DBMS and document stores are thriving

Document stores used in Rapid Application Development Frameworks

Visit our Adding Semantics to Model-Driven Software DevelopmentPoster

Use cases in both research and industry

Current solutions support R2RML and relational databases

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25

Motivation

NoSQL DBMS and document stores are thriving

Document stores used in Rapid Application Development Frameworks

Visit our Adding Semantics to Model-Driven Software DevelopmentPoster

Use cases in both research and industry

Current solutions support R2RML and relational databases

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25

Motivation

NoSQL DBMS and document stores are thriving

Document stores used in Rapid Application Development Frameworks

Visit our Adding Semantics to Model-Driven Software DevelopmentPoster

Use cases in both research and industry

Current solutions support R2RML and relational databases

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25

Motivation

NoSQL DBMS and document stores are thriving

Document stores used in Rapid Application Development Frameworks

Visit our Adding Semantics to Model-Driven Software DevelopmentPoster

Use cases in both research and industry

Current solutions support R2RML and relational databases

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25

Outline

1 Motivation and Scope

2 Approach

3 Evaluation

4 Conclusions and Future Work

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 5 / 25

SparqlMap Architecture

BindingTranslat.

SparqlMap

QueryAnalysis

QueryParsing

MappingBinding

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3) }

Result?name

------------

’Mary R.’

’James T.’

Translat.Exec.

[2] J. Unbehauen, C. Stadler, and S. Auer. Accessing relational data on the webwith sparqlmap. In JIST. 2012.[3] J. Unbehauen, C. Stadler, and S. Auer. Optimizing sparql-to-sql rewriting. InIIWAS, 2013.

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 6 / 25

SparqlMap-M Architecture

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

DeduplicationUnion Decom-

position

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

1 Data Models and Mapping

2 Query Structure

3 Querying Capabilities

4 Data Model Specific Optimization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25

SparqlMap-M Architecture

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

DeduplicationUnion Decom-

position

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

1 Data Models and Mapping

2 Query Structure

3 Querying Capabilities

4 Data Model Specific Optimization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25

SparqlMap-M Architecture

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

DeduplicationUnion Decom-

position

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

1 Data Models and Mapping

2 Query Structure

3 Querying Capabilities

4 Data Model Specific Optimization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25

SparqlMap-M Architecture

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

DeduplicationUnion Decom-

position

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

1 Data Models and Mapping

2 Query Structure

3 Querying Capabilities

4 Data Model Specific Optimization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25

SparqlMap-M Architecture

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

DeduplicationUnion Decom-

position

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

1 Data Models and Mapping

2 Query Structure

3 Querying Capabilities

4 Data Model Specific Optimization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25

Data Models and Mapping

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

Deduplication

Union De-composition

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 8 / 25

Data Models and Mapping

Key-Value pairs

Nested documents

Schema less

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 9 / 25

Data Models and Mapping

A relational view on documents by:

Goal: reuse existing (R2RML) concepts

Unnesting documents by joining them with parent → Flat structure

Naming attributes to reflect hierarchy → Key-Value treated as tuples

Schema imposed by mapping

#Department{ i d : 2 , name : ” Resea rch ” ,emp : [{ i d : 1 , name : ”Mary R. ”} ,

{ i d : 2 , name : ”James T. ” } ] } ,

i d | name | emp . i d | emp . name−−+−−−−−−−−−+−−−−−−−+−−−−−−−−2 | Resea rch |1 |Mary R .2 | Resea rch |2 | James T.

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 10 / 25

Query Structure

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

Deduplication

Union De-composition

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 11 / 25

Query Structure

SparqlMap

Recursive translation yields nested unions

Index hits require careful query design

Complex expressions for joins

SparqlMap-M / MongoDB

No direct equivalents for joins

No complex equivalence expression

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25

Query Structure

SparqlMap

Recursive translation yields nested unions

Index hits require careful query design

Complex expressions for joins

SparqlMap-M / MongoDB

No direct equivalents for joins

No complex equivalence expression

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25

Query Structure: Union Decomposition

Nested Unions:

./?dep=?dep

σname=Research

trm3

./?person=?person⋃trm1 trm4

⋃trm2 trm5

Pushed Union: ⋃./?dep=?dep

trm3 ./?person=?person

trm1 trm2

./?dep=?dep

trm3 ./?person=?person

trm4 trm5

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25

Query Structure: Union Decomposition

Nested Unions:

./?dep=?dep

σname=Research

trm3

./?person=?person⋃trm1 trm4

⋃trm2 trm5

Pushed Union: ⋃./?dep=?dep

trm3 ./?person=?person

trm1 trm2

./?dep=?dep

trm3 ./?person=?person

trm4 trm5

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25

Selective Materialization

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

Deduplication

Union De-composition

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 14 / 25

Selective Materialization

Delegate to abstraction layer (Apache MetaModel)

Execute unpushable SPARQL operators in memory

Πname

./id=depid

σname=”Research”

department employee

MaterializedExecution

SelectiveMaterialization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 15 / 25

De-Duplication

BindingTranslat.

SparqlMap-MSparqlMap

QueryAnalysis

QueryParsing

MappingBinding

SelectiveMaterialization

QuerySELECT DISTINCT ?name {

?person foaf:name ?name. #(tp1)

?person :inDepartment ?dep. #(tp2)

?dep rdfs:label ’Research’ #(tp3)

}Mapping

Deduplication

Union De-composition

MaterializedExecution?name

------------

’Mary R.’

’James T.’

Translat.Exec.

Result

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 16 / 25

De-Duplication

Documents are nested for fastretrieval and filtering

Naive mapping introduces overhead

Declaratively labelR2RML-TriplesMaps as duplicated

Only use denormalized data in joins

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25

De-Duplication

Documents are nested for fastretrieval and filtering

Naive mapping introduces overhead

Declaratively labelR2RML-TriplesMaps as duplicated

Only use denormalized data in joins

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25

Outline

1 Motivation and Scope

2 Approach

3 Evaluation

4 Conclusions and Future Work

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 18 / 25

Benchmark Setup

BSBM for availability of both SQL and RDF representation

SQL representation translated into MongoDB documents

Additionally performed denormalization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 19 / 25

Benchmark Results

BSBM 10 million triples

PostgreSQL Fastest

MongoDB-Naive/-Dup Dup required for performance

SparqlMap-M-Naive/ -Dup/ -DupAwareOverhead by rewriting/materialization

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 20 / 25

Benchmark Results

BSBM Q4

Medium selectivity

Naive modes touch a lot of data

Performance gain by duplicatedata (MongoDB, SparqlMap-M)

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 21 / 25

Benchmark Results

BSBM Q5

Low selectivity join

SparqlMap-M: expensive selfjoin in memory, dominates cost

MongoDB: Self-join inaggregate pipeline, slower thanPostgreSQL

BSBM Q9

High selectivity join

SparqlMap-M-Dup(Aware):duplicates increase overhead.Unpushable join dominates cost

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 22 / 25

Outline

1 Motivation and Scope

2 Approach

3 Evaluation

4 Conclusions and Future Work

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 23 / 25

Future Work

Enable Updates

Integrate Caching

Evaluate Join capable query language

MongoDB left outer join ($lookup)Multimodel databases: ArangoDB, OrientDBDB virtualizations: JBoss Teiid, Apache HAWQ

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 24 / 25

Conclusion

Architecture for a SPARQL execution layer over document stores

Harness duplicates for increasing performance

Evaluated with BSBM on MongoDB

J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 25 / 25

top related