Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005
Dec 19, 2015
Web Semantics: KB vs. DB
Zachary G. IvesUniversity of Pennsylvania
CIS 650 – Database & Information Systems
April 13, 2005
2
Administrivia
Next readings and summaries: Bernstein on Model Management Dong and Halevy on Personal Info
Management
2 paragraph summary of the problems they focus on, key contributions
3
Today’s Trivia Question
4
Last Time…
The Semantic Web vision and goals Core ideas:
RDF as “semantic” format Also RDFS schema format
Ontologies as the standard way of defining concepts
Description logics are the way most ontologies are defined (OWL language)
5
Description Logics(Borgida survey)
A class of languages based on FOL, like Datalog, Prolog
Key questions: subsumption of classes, recognition of members of classes
Prolog allows us to reason about instances: ParentOf(liz,andy). Male(andy). Child(_x) :- ParentOf(_z, _x) Son(_y) :- Male(_y), ParentOf(_w, _y)
DLs allow us to make further inferences – that andy is a Child, i.e., they realize: Child(x) (9 z) ParentOf(z,x) Son(y) (9 w) Male(y) Æ ParentOf(w,y)
6
Syntax and Semantics
Build variable-free composite terms from atoms using term constructors (e.g., at-most, all)
COURSE and at-most(10, takers) and all (takers, GRADS)
(:and COURSE (:at-most 10 takers) (:all takers GRADS) COURSE \ · 10 takers \ 8 takers:GRADS
Can be expressed in FOPC: COURSE(a) Æ (9 x1 … x10) takers(a,x1) Æ … Æ takers(a, x10) Æ
(x1 ≠ x2 Æ x2 ≠ x3 Æ … Æ x9 ≠ x10) Æ takers µ GRADS
7
Questions for DLs
Is a description D consistent and coherent? Not if the instance is empty for every possible relational
structure
Are D and D’ mutually disjoint? Yes if DI [ D’I = ; for every I
Are D and D’ equivalent? Yes if DI = D’I for every I
Does D subsume some other description D’? Yes if for every relational structure I, DI subsumes D’I
Inconsistency: and(C,D) NOTHING Equivalence: D subsumes D’, D’ subsumes D
8
DL Example class STUDENT is-a PERSON with
studNumber: int, key; level: {1,2,3,4} and(PERSON, all(studNumber, INTEGER), at-
least(1,studNumber),at-most(1,studNumber), all(level, one-of(1,2,3,4)), at-least(1,level),at-most(1,level)
at-most(1, compose(studNumber, inverse(studNumber)) ENROLLMENT := and(
all(st,STUDENT) at-least(1,st) at-most(1,st)all(crs,COURSE) at-least(1,crs) at-most(1,crs)all(when,DATE) at-least(1,when) at-most(1,when))
STUDENT := and(all(inverse(st), ENROLLMENT)at-least(1, inverse(st)) at-most(6, inverse(st))
COURSE := and(all(inverse(crs), ENROLLMENT)at-least(1, inverse(crs)) at-most(300,inverse(crs)))
INSERT-IN(Cs431, COURSE). FILL-WITH(Cs431,taughtBy,Einstein). FILL-WITH(Cs431,takers,Anna)
9
More on DLs
We can have both primitive classes (equivalent to extensional relations) and virtual ones But we can make assertions over virtual classes that
directly impact the primitive ones Contrast with updates to views in databases
Many different levels of expressiveness in different DLs
Comparison with Datalog: Both are subsets of FOL, with some limitations DLs allow bidirectional inference; Datalog is unidirectional DLs are equivalent to at most FOL with <= 3 variables;
Datalog has an unbounded number of existential variables
10
Coming Back to the SW
Lots of work on OWL, the Web Ontology Language Based on different levels of DLs:
OWL Lite – classification hierarchy, simple constraints (cardinalities 0 or 1)
OWL DL – maximum expressiveness, computational completeness (always decidable and terminating)
OWL Full – no computational guarantees, allows classes as instances of other classes
Goal: each community builds an ontology But how to relate ontologies?
“equivalentClass”, “equivalentProperty”, “sameAs”
Is this enough???
11
The Data Management Argument
The Semantic Web is all about integration and translation
But there’s no notion of translation in the SW, except for equivalences
“Semantic normalization”???
Does DB research have something to add? If so, what needs to change?
12
Database Approaches to Semantic Integration
Data warehouse Design a single schema
Do physical DB design Map data into
warehouse schema Periodically update
warehouse
DataIntegration
System
Query
Mediatedschema
Wrappers
(demand-driven)
Data incommonformat
XML Sources
Rel. Sources
Virtual data integration (EII) Design mediated schema Map sources to mediated
schema Queries are rewritten and
answered on demand from sources
13
A Single Centralized Schema is a Bottleneck!
Challenging to form a single schema for all domain data People don’t agree on how concepts should be represented Data warehouse: physical design is a strong consideration Mediated schema very different from original users’ schemas
Mappings may be challenging to create, and do not leverage work of previous source mappings
Each source gets mapped to mediated schema separately
Difficult to evolve this single schema as needs change May “break” existing queries Must build consensus for any schema changes
14
Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility
DB Projects
UPenn UW Stanford IIT Mumbai
Data integration: 1 mediated schema, m mappings to sources
Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1)
mappings between them – evaluated transitively m mappings to sources
15
Peer-to-Peer at both Logical and Architectural Levels
A “logical” peer-to-peer model:Every participant can contribute:
Extensional data Mappings between schemas Computation (query answering) and caching
Can we do a database (say, XML) version of the SW?
16
RDF vs. XML
RDF explicitly names relationships:(book, title, “ABC”)(book, writtenBy, author)(author, name, “John Smith”)
XML does not always:1. <book>
<title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book>
2. <book> <title>ABC</title> <author>John Smith</author></book>
title name
book authorwrittenBy
17
RDF vs. XML 2
RDF is subject-neutral (a graph) XML centers around a subject (a tree):
1. <book> <title>ABC</title> <author>John Smith</author></book>
2. <author> <name>John Smith</name> <book>ABC</book></book>
This may result in duplication of contained objects
18
An XML Version of the Semantic Web
Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML
(“XML has no semantics.”)
Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about
containment CAVEAT: less expressive than OWL classes
Schema mappings: target schema as query over sourceSophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries
19
Let’s Start with the Relational Model and then Extend
GAV: mediated relations as views over sources
Easy to rewrite queries: unfold them using view definitions
LAV: sources as views over mediated relations
More challenging to rewrite queries: answering queries using views (e.g., MiniCon [Pottinger & Levy 00])
More flexible in representing source properties
Med. Schema T1, …
…MST1(X’) :- S1(X),…MST2(Y’) :- S2(Y),…
Med. Schema T1, …
…S1(X’) MST1(X),…S2(Y’) MST1(Y),…
S1(X) S2(Y)
S1(X) S2(Y)
20
Answering Queries in a PDMS:Transitively Evaluating Mappings
Start with schema being queried Look up mappings to neighbors; expand Continue iteratively until queries only over sources
Mappings in a PDMS may be a combination of LAV, GAV techniques: General form p1a(X, Y), p1b(Y,Z), … = p2a(Y, X), p2b(X,
Z), …(see paper for examination of what is actually tractable)
Requires unfolding and AQUV
We use a rule-goal “tree” to expand the mappings Extend some of the ideas of MiniCon to avoid
unnecessary expansions Challenges to avoid redundancy – see paper for
optimizations
21
Example of Query Answering
Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p),
ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)
Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(f,s,end)r3: CoAuthor(f1,f2) :- S2(f1,f2)
Query: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
Sched(f,s,e)
SameProject (a1,a2,p)
ProjMember (a1,p)
CoAuthor (a1,a2)
Author (a,w)
S1 S2
r0
r2
r3
r1
22
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
q
23
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
q
24
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
q
Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)
25
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
q
r0 r1 r1
Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)
26
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)
q
r0 r1 r1
Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)
27
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)
q
r0 r1 r1
Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)
r3 r3r2 r2
28
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)
S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)
q
r0 r1 r1
r3 r3r2 r2
29
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)
S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)
q
r0 r1 r1
r3 r3r2 r2
Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)
30
Stepping up to XML (WWW03)
Goals: Build on XQuery and XML (extended with RDF-style identity,
following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types
Directional mapping language based on templates<output> {: $var IN document(“doc”)/path WHERE condition :}
<tag>$var</tag></output>
Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion
Can map XML-XML, XML-RDF, RDF-XML (at data level)
31
Mapping Example between XML Schemas
Target:pubs
book* title
author*
name
Source:authors
author* full-
name publication*
title pub-type
pub-type name
publication authorwrittenBy
title
32
Example Piazza Mapping
<pubs><book piazza:id={$t}>{: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t < ‘B’ :}
<title>{$t}</title>
<author><name>{$an}</name></author></book>
</pubs>
33
Challenges
Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to
map from values to tags, … Can only do ~ the XML equivalent of
conjunctive queries
See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details
34
What about Values?
Thus far, we’ve focused on schema mappings
Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc.
The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each
of which can compute transitive closures separately
35
Assessment: The Semantic Web
The KB world focuses on expressively capturing concepts
The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)
Do either of these seem likely to change the world?
What barriers need to be removed?