Managing Completeness of Web Data Fariz Darari PhD Supervisor: Werner Nutt Supported by the project MAGIC, funded by the province of Bolzano Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38
Managing Completeness of Web Data
Fariz DarariPhD Supervisor: Werner Nutt
Supported by the project MAGIC, funded by the province of Bolzano
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38
About Us
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 2 / 38
Research Group
Sorted by distance to Werner’s office :)
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 3 / 38
Bozen-Bolzano
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 4 / 38
Motivation
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 5 / 38
Completeness statements are already there
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38
However . . .
Completeness statements are availablebut only in natural languageUnclear what data completeness & query completeness meanNo techniques to check whether data completeness entailsquery completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38
Solution Ideas
Completeness statements are availablebut only in natural language
Solution: RDF-ize completeness statements
Unclear what data completeness & query completeness meanSolution: Formalize data completeness & query completeness
No techniques to check whether data completeness entailsquery completeness
Solution: Develop techniques to check whether data completenessentails query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solution Ideas
Completeness statements are availablebut only in natural language
Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
No techniques to check whether data completeness entailsquery completeness
Solution: Develop techniques to check whether data completenessentails query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solution Ideas
Completeness statements are availablebut only in natural language
Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
No techniques to check whether data completeness entailsquery completeness
Solution: Develop techniques to check whether data completenessentails query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
Solutions
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 9 / 38
Background: RDF
Grd = { (resDogs,dir , tarantino),(resDogs,act , tarantino) }
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 10 / 38
Background: SPARQL
SELECTQsdir = ({ ?m }, { (?m,dir , tarantino) })
ASKQadir = ({ }, { (?m,dir , tarantino) })
CONSTRUCT
Qcdir = ({ (?m,dir , tarantino) }, { (?m,dir , tarantino) })
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 11 / 38
Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,Gdbp = (Ga
dbp,Gidbp):
Gadbp = {(resDogs,dir , tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38
Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,Gdbp = (Ga
dbp,Gidbp):
Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38
Story: Completeness Statement
Gadbp = {(resDogs,dir , tarantino)}
Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}
From (Gadbp,G
idbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m,dir , tarantino) | ∅)
However, it is not complete for actors in movies directed by Tarantino:
Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
Story: Completeness Statement
Gadbp = {(resDogs,dir , tarantino)}
Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}
From (Gadbp,G
idbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m,dir , tarantino) | ∅)
However, it is not complete for actors in movies directed by Tarantino:
Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
Story: Query Completeness
Gadbp = {(resDogs,dir , tarantino)}
Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}
Consequently, when we ask for all movies directed by Tarantinoover DBpedia:
Qdir = ({?m}, {(?m,dir , tarantino)})
the query completeness Compl(Qdir ) is obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38
Story: Query Completeness
Gadbp = {(resDogs,dir , tarantino)}
Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}
However, if we ask for all movies directed by and starring Tarantino:
Qdir+act = ({?m}, {(?m,dir , tarantino), (?m,act , tarantino)})
the query completeness Compl(Qdir+act) is not obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38
Incomplete Data Source
Definition (Incomplete Data Source)An incomplete data source is a pair of two graphs
G = (Ga,Gi), where Ga ⊆ Gi .
We call Ga the available graph and Gi the ideal graph.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38
Completeness Statement
Definition (Completeness Statement)Let P1 be a non-empty BGP and P2 a BGP.
A completeness statement is defined as
Compl(P1 | P2)
where we call P1 the pattern and P2 the condition of the statement.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38
Satisfaction of Completeness Statements
To a statementC = Compl(P1 | P2),
we associate the CONSTRUCT query
QC = (P1,P1 ∪ P2).
Then, we say:
C is satisfied by an incomplete data source G = (Ga,Gi),written G |= C, if
JQCKGi ⊆ Ga.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38
Completeness Statements in RDF
Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))
lv:dataset a void:Dataset;c:hasComplStmt lv:csAct.
lv:csAct c:hasPattern [c:subject [c:varName "m"];c:predicate s:actor;c:object [c:varName "a"]];
c:hasCondition [c:subject [c:varName "m"];c:predicate s:director;c:object lmdb:Quentin_Tarantino].
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38
Query Completeness
Definition (Query Completeness)Let Q be a query. We write
Compl(Q)
to say that Q is complete.
An incomplete data source G = (Ga,Gi) satisfies Compl(Q),written G |= Compl(Q), if
JQKGi = JQKGa .
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38
Completeness Entailment
Problem Definition (Completeness Entailment)Let C be a set of completeness statements and Q a query.
We say that C entails the completeness of Q, written
C |= Compl(Q),
if any incomplete data source satisfying C also satisfies Compl(Q).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38
Intuition: Completeness Entailment
Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act) where
Pdir+act = { (?m,dir , tarantino), (?m,act , tarantino) }.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38
Intuition: Completeness Entailment
Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).
Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38
Intuition: Completeness Entailment
Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).
Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }
Therefore,
JQCdir KPdir+act∪ JQCact KPdir+act
=
{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .
Thus,Cdir ,act |= Compl(Qdir+act).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Intuition: Completeness Entailment
Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).
Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }
Therefore,
JQCdir KPdir+act∪ JQCact KPdir+act
=
{ (m,dir , tarantino), (m,act , tarantino) } =
Pdir+act .
Thus,Cdir ,act |= Compl(Qdir+act).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Intuition: Completeness Entailment
Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).
Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }
Therefore,
JQCdir KPdir+act∪ JQCact KPdir+act
=
{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .
Thus,Cdir ,act |= Compl(Qdir+act).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
Prototypical Graph
Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }
Definition (Prototypical Graph)Let Q = (W ,P) be a query.
The freeze mapping id is defined as a mappingfrom each variable ?v in P to a new IRI v .
Instantiating the graph pattern P with id yields the graph
P := id P,
which we call the prototypical graph of Q.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38
Transfer Operator
JQCdir KPdir+act∪ JQCact KPdir+act
Definition (Transfer Operator)For any set C of completeness statements and a graph G,we define the transfer operator TC that computes the unionof the evaluation over G of all CONSTRUCT queriesof the statements in C:
TC(G) =⋃
C∈C
JQCKG
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38
Completeness Entailment Theorem
Pdir+act = TCdir,act (Pdir+act)
Theorem (Completeness of Basic Queries)Let C be a set of completeness statements andQ = (W ,P) a basic query. Then,
C |= Compl(Q) if and only if P = TC(P).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38
Query Class: DISTINCT Queries
Give us all Oscar-winning things:
Qawd = (Wawd ,Pawd)d =
({?m}, { (?m,award ,oscar), (?m,award , ?aw) })d
Complete for all Oscar-winning things:
Cos = Compl((?m,award ,oscar) | ∅)
{Cos } |= Compl(Qawd) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38
Query Class: OPT Queries
Give us all movies, and their awards, if any:
Qmaw = ({ ?m, ?aw }, ((?m,a,Movie) OPT (?m,award , ?aw)))
Complete for all movies and their awards:
Caw = Compl((?m,a,Movie), (?m,award , ?aw) | ∅)
{Caw } |= Compl(Qmaw ) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38
Query Class: Queries under RDFS Semantics
Give us all films:
Qfilm = ({ ?m }, { (?m,a,Film) })
Complete for all movies:
Cmovie = Compl((?m,a,Movie) | ∅)
Films are the same as movies:
Sfm = {(Film, subclass,Movie), (Movie, subclass,Film)}
{Cmovie } |= Compl(Qfilm) wrt. Sfm holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38
Federated Completeness Statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 32 / 38
Timestamped Completeness Statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 33 / 38
Conclusions
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 34 / 38
Conclusions
Completeness statements can now be represented in RDFWe know how completeness statements can entail querycompleteness in different query classes anddifferent settings of completeness statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38
Future Work
Completeness statements for queries with negationCompleteness statements as session annotationsfor RDF streamsStatistical completeness reasoning
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38
Publications
Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: CompletenessStatements about RDF Data Sources and Their Use for Query Answering.ISWC 2013.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A CompletenessReasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters andDemos 2014.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gapbetween RDF and SPARQL using Completeness Statements. ISWC Postersand Demos 2014.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-ValueInformation in RDF. ISWC Posters & Demos 2015.
The latest results (timestamped statements and efficient completenessreasoning with 1 million statements) have been submitted to a journal.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38
Compl((myDaSePresentation, slide, ?s) | ∅)
Thank You!
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 38 / 38