1 SPARQLing Constraints for RDF Michael Schmidt EDBT, 2008 March 28 joint work with Prof. Georg Lausen, Michael Meier
1
SPARQLing Constraints for RDF
Michael Schmidt
EDBT, 2008 March 28joint work with Prof. Georg Lausen, Michael Meier
2
SPARQLing Constraints for RDF
RDF Data Format
• Machine-readable information
• Established in the Semantic Web
SPARQL Query Language
• Declarative Language
• W3C Recommendation since Jan.
Constraints
• Primary and foreign keys
• Cardinality constraints, …bases on
Extension of RDF by constraints With fixed semantics Integration into the Framework
The role of SPARQL in this context Extracting constraints Checking constraints Optimization of SPARQL
queries under constraints
3
Why Constraints?
Restricting the state space of the database Maintenance of data consistency (e.g. when
data is updated) Semantic Query Optimization Better understanding of the data Here: Translation of Relational Schemata to
RDF without loss of information
4
The RDF Data Format
„Fred“
Teachers
t1 t2 „43“„CS“
name
faculty
rdf:type
„Joe“ name
age
knows
„Triples of Knowledge“
(t1, name, „Joe“) , (t1, faculty, „CS“) , (t1, knows, t2)
5
The RDF Data Format
„Fred“
Teachers
t1 t2 „43“„CS“
name
faculty
rdf:type
„Joe“ name
age
knows
Three elementary types URIs (describe physical/logical entities & properties) Literals (string values) Blank Nodes (not conisdered)
6
A Relational Data Scheme
name faculty
Joe CS
Fred CS
matric name
11111 John
22222 Ed
taught_by name
Joe DB
Fred Web
c_id s_id
Fred 11111
Fred 22222
Teachers Students
Courses Participants
+ NOT NULL constraints on each column
7
A Translation into RDF
Students
name
Teachers
Courses
t1 t2 s1 s2
c1 c2
Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“
“DB“ “Web“
namename name
name name
matric matric
facultyfaculty
taught_by
taught_by
Participants
p1 p2
s_ids_id
c_idc_id
rdf:type
Problem: Constraints only implicitly given!
8
Constraints for RDF
Encoding in the schema layer New namespace „rdfc“ provides constraint
vocabulary with fixed semantics rdfc:Key for primary keys rdfc:FKey for foreign keys rdfc:ref links foreign keys to primary keys
Use built-in RDF container class rdf:Seq
9
taught_by
Courses
c1 c2
“DB“ “Web“
name nametaught_by taught_by
rdfc:FKey
nameT_Key
rdfc:Key
rdf:_1 name
rdfc:Key
rdf:Seq
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
facultyfaculty
C_FKey
rdfc:FKey rdf:Seq
rdfc:ref
rdf:_1
Encoding Constraints
10
Types of Constraints
Let C, C1, C2 be classes and Qi, Ri properties
Primary keys, foreign keys
Key(C,[Q1,…Qn]), FKey(C1,[Q1,…Qn],C2,[R1,…Rn])
Cardinality constraints
Min(C,n,R), Max(C,n,R) for n N
Functionality constraints, totality constraints
Func(C,Q), Total(C,Q)
and many more in the full paper: singleton, subclass, subproperty, property domain, property range
11
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
in general undecidable
Shown by reduction from the key implication problem in Relational Databases
In the paper, we indicate satisfiable constraint subclasses decidable constraint subclasses
12
The SPARQL Query Language
SELECT ?name ?faculty ?titleWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. }}
Declarative language Bases upon graph patterns that are matched
against the input graph Different operators to combine these patterns
AND („.“) OPTIONAL UNION FILTER
13
SPARQL Query Evaluation
SELECT ?name ?faculty ?titleWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. }}
title
„Professor“
?name ?faculty ?title
Joe “CS“
Fred “CS“ “Professor“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
name facultyfaculty
?teacher
?name
?faculty
?title: unbound
Variables are matched against the input graph
14
Extracting Key Constraints
SELECT ?keyname ?class ?keyattWHERE { ?class rdfc:Key ?keyname. ?keyname rdf:type rdfc:Key. ?keyname ?seq ?keyatt. FILTER (?seq!=rdf:type)}
?keyname ?class ?keyatt
T_Key Teachers name
T_Key
rdfc:Key
rdf:_1 namerdfc:Key
rdf:Seq
Teachers
… …
Extraction of foreign keys very similar
15
Constraint checks possible for many types constraints
A SPARQL query checks a constraint C if it returns yes for each graph that violates C, no otherwise.
Use SPARQL „ASK“ query form (returns „yes“ exactly if query contains a result,
„no“ otherwise)
Checking Constraints with SPARQL
16
Checking primary key constraints
ASK { ?x rdf:type C. ?y rdf:type C. ?x p1 ?p1; [...]; pn ?pn. ?y p1 ?p1; [...]; pn ?pn. FILTER (?x!=?y)}
Key(C,[p1,. . . ,pn])
Returns „yes“ exactly if constraint is violated.
Checking Constraints with SPARQL
Checking of foreign keys is a little more complicated, but also possible
17
Semantic Query Optimization
Idea: use constraint knowledge to find a more efficient query execution plan
Has been studied in the context of relational and datalog databases…
… and now is applicable in the context of RDF and SPARQL
18
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}
19
Students
name
Teachers
Courses
t1 t2 s1 s2
c1 c2
Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“
“DB“ “Web“
namename name
name name
matric matric
facultyfaculty
taught_by
taught_by
Participants
p1 p2
s_ids_id
c_idc_id
A Solution Candidate Subgraph
20
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}
Key(Students,[matric])
FKey(Participants, [s_id], Students, [matric])
Total(Students,[name])
21
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}
Key(Teacher, [name])
FKey(Courses, taught_by, Teacher, [name])
22
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}
Many more optimizations possible Rewriting of filter expressions Elimination of redundant rdf:type specifications
23
Future Work
Study of other types of constraints and the interaction between constraints
Development of a schematic approach to Semantic Query Optimization Mapping to SQL/Datalog? SPARQL-specific semantic optimizations?
Efficient constraint checking algorithms
24
Thank you for your attention!
• C. Bizer.D2R MAP-A Database to RDF Mapping Language. In WWW (Posters), 2003.• C.Bizer, R.Cyganiak, J. Garbers, and O. Maresch. D2RQ: Treading Non-RDF Relational Databases as Virtual RDF Graphs. User Manual and Language Specification.• J. J. King. QUIST: A System for Semantic Query Optimization in Relational Databases. Distributed systems, Vol. II, pages 287-294, 1986.• G. Lausen. Relational Databases in RDF. In Joint ODBIS & SWDB Workshop on Semantic Web, Ontologies, Databases, 2007. B. Motik, I. Horrocks, and U. Sattler. Bridging the Gap Between OWL and Relational Databases, In WWW, pages 807-816, 2007.• J. Pérez, M. Arenas, and C. Gutierrez. Semantics and Complexity of SPARQL. In CoRR Technical Report cs.DB/0605124, 2006.
• Recourse Description Framework (RDF): Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, February 10, 2004.• RDF Vocabulary Description Language 1.0: RDF Schema. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, Febuary 10, 2004.• RDF Semantics.http://www.w3.org/TR/rdf-mt/. W3C Recommendation, February 10, 2004.• S.T. Shenoy and Z.M. Ozsoyoglu. A System for Semantic Query Optimization. In SIGMOD, pages 181-195, 1987.• SPAQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/. W3C Proposed Recommendation, November 12, 2007.• G.E. Weddell. A Theory of Functional Dependencies for Object-Oriented Data Models. In DOOD, pages 165-184, 1989.
25
Additional Resources
26
Checking Constraints with SPARQL
Checking foreign key constraints
ASK { ?x rdf:type C; p1 ?p1; [...]; pn ?pn. OPTIONAL { ?y rdf:type D; q1 ?p1; [...]; qn ?pn. } FILTER (!bound(?y))}
FKey(C,[p1,. . . ,pn],D,[q1,... qn])
Bind objects of type C, with properties bound to ?p1, …, ?pn
Bind the (referenced) object to variable ?y, if any
Only keep results for which no
referenced object exists
27
RDFS Constraints
Let Ci denote classes, Qi denote properties
Subclass Constraint
SubC(C1,C2)
Subproperty Constraint
SubP(Q1,Q2)
Property Domain/Range
PropD(Q,C), PropR(Q,C)
Restrict the state space of the database
No „axioms“ that are used for inferencing
28
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
in general undecidable
Primary keys + Foreign Keys
Singleton
Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
always satisfiable
29
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
Primary keys + Foreign Keys
Singleton
Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
Min-Cardinality
undecidable
in general undecidable
30
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
Unary primary keys
Unary foreign keys
Min-Cardinality + Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
decidable in ExpTime
in general undecidable
31
The SPARQL Query Language
SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty.}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
?name ?faculty
Joe “CS“
Fred “CS“
Operator AND („.“)
32
The SPARQL Query Language
Operator UNIONSELECT ?name ?facultyWHERE { { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“). } UNION { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Fred“). }}
?name ?faculty
Joe “CS“
Fred “CS“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
33
The SPARQL Query Language
SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“)}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
?name ?faculty
Joe “CS“
Operator FILTER
34
The SPARQL Query Language
SELECT ?name ?faculty ?titleWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. }}
title
„Professor“
?name ?faculty ?title
Joe “CS“
Fred “CS“ “Professor“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
Operator OPTIONAL