Executing SPARQL queries over Mapped DocumentStores with SparqlMap-M
J. Unbehauen M. Martin
IIS // AKSW // BIS // IfILeipzig University
SEMANTiCS 2016
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 1 / 25
Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 2 / 25
Scoping
[1] S. Auer, J. Lehmann, A. Ngonga Ngomo. Introduction to Linked Data and ItsLifecycle on the Web, Reasoning Web. Semantic Technologies for the Web of
Data, LNCS 6848, 2011
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 3 / 25
Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software DevelopmentPoster
Use cases in both research and industry
Current solutions support R2RML and relational databases
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software DevelopmentPoster
Use cases in both research and industry
Current solutions support R2RML and relational databases
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software DevelopmentPoster
Use cases in both research and industry
Current solutions support R2RML and relational databases
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software DevelopmentPoster
Use cases in both research and industry
Current solutions support R2RML and relational databases
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 5 / 25
SparqlMap Architecture
BindingTranslat.
SparqlMap
QueryAnalysis
QueryParsing
MappingBinding
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3) }
Result?name
------------
’Mary R.’
’James T.’
Translat.Exec.
[2] J. Unbehauen, C. Stadler, and S. Auer. Accessing relational data on the webwith sparqlmap. In JIST. 2012.[3] J. Unbehauen, C. Stadler, and S. Auer. Optimizing sparql-to-sql rewriting. InIIWAS, 2013.
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 6 / 25
SparqlMap-M Architecture
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
DeduplicationUnion Decom-
position
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
1 Data Models and Mapping
2 Query Structure
3 Querying Capabilities
4 Data Model Specific Optimization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25
SparqlMap-M Architecture
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
DeduplicationUnion Decom-
position
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
1 Data Models and Mapping
2 Query Structure
3 Querying Capabilities
4 Data Model Specific Optimization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25
SparqlMap-M Architecture
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
DeduplicationUnion Decom-
position
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
1 Data Models and Mapping
2 Query Structure
3 Querying Capabilities
4 Data Model Specific Optimization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25
SparqlMap-M Architecture
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
DeduplicationUnion Decom-
position
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
1 Data Models and Mapping
2 Query Structure
3 Querying Capabilities
4 Data Model Specific Optimization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25
SparqlMap-M Architecture
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
DeduplicationUnion Decom-
position
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
1 Data Models and Mapping
2 Query Structure
3 Querying Capabilities
4 Data Model Specific Optimization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 7 / 25
Data Models and Mapping
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
Deduplication
Union De-composition
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 8 / 25
Data Models and Mapping
Key-Value pairs
Nested documents
Schema less
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 9 / 25
Data Models and Mapping
A relational view on documents by:
Goal: reuse existing (R2RML) concepts
Unnesting documents by joining them with parent → Flat structure
Naming attributes to reflect hierarchy → Key-Value treated as tuples
Schema imposed by mapping
#Department{ i d : 2 , name : ” Resea rch ” ,emp : [{ i d : 1 , name : ”Mary R. ”} ,
{ i d : 2 , name : ”James T. ” } ] } ,
i d | name | emp . i d | emp . name−−+−−−−−−−−−+−−−−−−−+−−−−−−−−2 | Resea rch |1 |Mary R .2 | Resea rch |2 | James T.
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 10 / 25
Query Structure
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
Deduplication
Union De-composition
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 11 / 25
Query Structure
SparqlMap
Recursive translation yields nested unions
Index hits require careful query design
Complex expressions for joins
SparqlMap-M / MongoDB
No direct equivalents for joins
No complex equivalence expression
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25
Query Structure
SparqlMap
Recursive translation yields nested unions
Index hits require careful query design
Complex expressions for joins
SparqlMap-M / MongoDB
No direct equivalents for joins
No complex equivalence expression
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25
Query Structure: Union Decomposition
Nested Unions:
./?dep=?dep
σname=Research
trm3
./?person=?person⋃trm1 trm4
⋃trm2 trm5
Pushed Union: ⋃./?dep=?dep
trm3 ./?person=?person
trm1 trm2
./?dep=?dep
trm3 ./?person=?person
trm4 trm5
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25
Query Structure: Union Decomposition
Nested Unions:
./?dep=?dep
σname=Research
trm3
./?person=?person⋃trm1 trm4
⋃trm2 trm5
Pushed Union: ⋃./?dep=?dep
trm3 ./?person=?person
trm1 trm2
./?dep=?dep
trm3 ./?person=?person
trm4 trm5
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25
Selective Materialization
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
Deduplication
Union De-composition
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 14 / 25
Selective Materialization
Delegate to abstraction layer (Apache MetaModel)
Execute unpushable SPARQL operators in memory
Πname
./id=depid
σname=”Research”
department employee
MaterializedExecution
SelectiveMaterialization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 15 / 25
De-Duplication
BindingTranslat.
SparqlMap-MSparqlMap
QueryAnalysis
QueryParsing
MappingBinding
SelectiveMaterialization
QuerySELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}Mapping
Deduplication
Union De-composition
MaterializedExecution?name
------------
’Mary R.’
’James T.’
Translat.Exec.
Result
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 16 / 25
De-Duplication
Documents are nested for fastretrieval and filtering
Naive mapping introduces overhead
Declaratively labelR2RML-TriplesMaps as duplicated
Only use denormalized data in joins
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25
De-Duplication
Documents are nested for fastretrieval and filtering
Naive mapping introduces overhead
Declaratively labelR2RML-TriplesMaps as duplicated
Only use denormalized data in joins
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25
Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 18 / 25
Benchmark Setup
BSBM for availability of both SQL and RDF representation
SQL representation translated into MongoDB documents
Additionally performed denormalization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 19 / 25
Benchmark Results
BSBM 10 million triples
PostgreSQL Fastest
MongoDB-Naive/-Dup Dup required for performance
SparqlMap-M-Naive/ -Dup/ -DupAwareOverhead by rewriting/materialization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 20 / 25
Benchmark Results
BSBM Q4
Medium selectivity
Naive modes touch a lot of data
Performance gain by duplicatedata (MongoDB, SparqlMap-M)
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 21 / 25
Benchmark Results
BSBM Q5
Low selectivity join
SparqlMap-M: expensive selfjoin in memory, dominates cost
MongoDB: Self-join inaggregate pipeline, slower thanPostgreSQL
BSBM Q9
High selectivity join
SparqlMap-M-Dup(Aware):duplicates increase overhead.Unpushable join dominates cost
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 22 / 25
Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 23 / 25
Future Work
Enable Updates
Integrate Caching
Evaluate Join capable query language
MongoDB left outer join ($lookup)Multimodel databases: ArangoDB, OrientDBDB virtualizations: JBoss Teiid, Apache HAWQ
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 24 / 25
Conclusion
Architecture for a SPARQL execution layer over document stores
Harness duplicates for increasing performance
Evaluated with BSBM on MongoDB
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 25 / 25