Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

Web Semantics: KB vs. DB

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Database & Information Systems

April 13, 2005

2

Administrivia

Next readings and summaries: Bernstein on Model Management Dong and Halevy on Personal Info

Management

2 paragraph summary of the problems they focus on, key contributions

3

Today’s Trivia Question

4

Last Time…

The Semantic Web vision and goals Core ideas:

RDF as “semantic” format Also RDFS schema format

Ontologies as the standard way of defining concepts

Description logics are the way most ontologies are defined (OWL language)

5

Description Logics(Borgida survey)

A class of languages based on FOL, like Datalog, Prolog

Key questions: subsumption of classes, recognition of members of classes

Prolog allows us to reason about instances: ParentOf(liz,andy). Male(andy). Child(_x) :- ParentOf(_z, _x) Son(_y) :- Male(_y), ParentOf(_w, _y)

DLs allow us to make further inferences – that andy is a Child, i.e., they realize: Child(x) (9 z) ParentOf(z,x) Son(y) (9 w) Male(y) Æ ParentOf(w,y)

6

Syntax and Semantics

Build variable-free composite terms from atoms using term constructors (e.g., at-most, all)

COURSE and at-most(10, takers) and all (takers, GRADS)

(:and COURSE (:at-most 10 takers) (:all takers GRADS) COURSE \ · 10 takers \ 8 takers:GRADS

Can be expressed in FOPC: COURSE(a) Æ (9 x1 … x10) takers(a,x1) Æ … Æ takers(a, x10) Æ

(x1 ≠ x2 Æ x2 ≠ x3 Æ … Æ x9 ≠ x10) Æ takers µ GRADS

7

Questions for DLs

Is a description D consistent and coherent? Not if the instance is empty for every possible relational

structure

Are D and D’ mutually disjoint? Yes if DI [ D’I = ; for every I

Are D and D’ equivalent? Yes if DI = D’I for every I

Does D subsume some other description D’? Yes if for every relational structure I, DI subsumes D’I

Inconsistency: and(C,D) NOTHING Equivalence: D subsumes D’, D’ subsumes D

8

DL Example class STUDENT is-a PERSON with

studNumber: int, key; level: {1,2,3,4} and(PERSON, all(studNumber, INTEGER), at-

least(1,studNumber),at-most(1,studNumber), all(level, one-of(1,2,3,4)), at-least(1,level),at-most(1,level)

at-most(1, compose(studNumber, inverse(studNumber)) ENROLLMENT := and(

all(st,STUDENT) at-least(1,st) at-most(1,st)all(crs,COURSE) at-least(1,crs) at-most(1,crs)all(when,DATE) at-least(1,when) at-most(1,when))

STUDENT := and(all(inverse(st), ENROLLMENT)at-least(1, inverse(st)) at-most(6, inverse(st))

COURSE := and(all(inverse(crs), ENROLLMENT)at-least(1, inverse(crs)) at-most(300,inverse(crs)))

INSERT-IN(Cs431, COURSE). FILL-WITH(Cs431,taughtBy,Einstein). FILL-WITH(Cs431,takers,Anna)

9

More on DLs

We can have both primitive classes (equivalent to extensional relations) and virtual ones But we can make assertions over virtual classes that

directly impact the primitive ones Contrast with updates to views in databases

Many different levels of expressiveness in different DLs

Comparison with Datalog: Both are subsets of FOL, with some limitations DLs allow bidirectional inference; Datalog is unidirectional DLs are equivalent to at most FOL with <= 3 variables;

Datalog has an unbounded number of existential variables

10

Coming Back to the SW

Lots of work on OWL, the Web Ontology Language Based on different levels of DLs:

OWL Lite – classification hierarchy, simple constraints (cardinalities 0 or 1)

OWL DL – maximum expressiveness, computational completeness (always decidable and terminating)

OWL Full – no computational guarantees, allows classes as instances of other classes

Goal: each community builds an ontology But how to relate ontologies?

“equivalentClass”, “equivalentProperty”, “sameAs”

Is this enough???

11

The Data Management Argument

The Semantic Web is all about integration and translation

But there’s no notion of translation in the SW, except for equivalences

“Semantic normalization”???

Does DB research have something to add? If so, what needs to change?

12

Database Approaches to Semantic Integration

Data warehouse Design a single schema

Do physical DB design Map data into

warehouse schema Periodically update

warehouse

DataIntegration

System

Query

Mediatedschema

Wrappers

(demand-driven)

Data incommonformat

XML Sources

Rel. Sources

Virtual data integration (EII) Design mediated schema Map sources to mediated

schema Queries are rewritten and

answered on demand from sources

13

A Single Centralized Schema is a Bottleneck!

Challenging to form a single schema for all domain data People don’t agree on how concepts should be represented Data warehouse: physical design is a strong consideration Mediated schema very different from original users’ schemas

Mappings may be challenging to create, and do not leverage work of previous source mappings

Each source gets mapped to mediated schema separately

Difficult to evolve this single schema as needs change May “break” existing queries Must build consensus for any schema changes

14

Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility

DB Projects

UPenn UW Stanford IIT Mumbai

Data integration: 1 mediated schema, m mappings to sources

Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1)

mappings between them – evaluated transitively m mappings to sources

15

Peer-to-Peer at both Logical and Architectural Levels

A “logical” peer-to-peer model:Every participant can contribute:

Extensional data Mappings between schemas Computation (query answering) and caching

Can we do a database (say, XML) version of the SW?

16

RDF vs. XML

RDF explicitly names relationships:(book, title, “ABC”)(book, writtenBy, author)(author, name, “John Smith”)

XML does not always:1. <book>

<title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book>

2. <book> <title>ABC</title> <author>John Smith</author></book>

title name

book authorwrittenBy

17

RDF vs. XML 2

RDF is subject-neutral (a graph) XML centers around a subject (a tree):

1. <book> <title>ABC</title> <author>John Smith</author></book>

2. <author> <name>John Smith</name> <book>ABC</book></book>

This may result in duplication of contained objects

18

An XML Version of the Semantic Web

Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML

(“XML has no semantics.”)

Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about

containment CAVEAT: less expressive than OWL classes

Schema mappings: target schema as query over sourceSophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries

19

Let’s Start with the Relational Model and then Extend

GAV: mediated relations as views over sources

Easy to rewrite queries: unfold them using view definitions

LAV: sources as views over mediated relations

More challenging to rewrite queries: answering queries using views (e.g., MiniCon [Pottinger & Levy 00])

More flexible in representing source properties

Med. Schema T1, …

…MST1(X’) :- S1(X),…MST2(Y’) :- S2(Y),…

Med. Schema T1, …

…S1(X’) MST1(X),…S2(Y’) MST1(Y),…

S1(X) S2(Y)

S1(X) S2(Y)

20

Answering Queries in a PDMS:Transitively Evaluating Mappings

Start with schema being queried Look up mappings to neighbors; expand Continue iteratively until queries only over sources

Mappings in a PDMS may be a combination of LAV, GAV techniques: General form p1a(X, Y), p1b(Y,Z), … = p2a(Y, X), p2b(X,

Z), …(see paper for examination of what is actually tractable)

Requires unfolding and AQUV

We use a rule-goal “tree” to expand the mappings Extend some of the ideas of MiniCon to avoid

unnecessary expansions Challenges to avoid redundancy – see paper for

optimizations

21

Example of Query Answering

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p),

ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(f,s,end)r3: CoAuthor(f1,f2) :- S2(f1,f2)

Query: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

Sched(f,s,e)

SameProject (a1,a2,p)

ProjMember (a1,p)

CoAuthor (a1,a2)

Author (a,w)

S1 S2

r0

r2

r3

r1

22

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

q

23



SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

q

24




q

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

25




q

r0 r1 r1

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

26




ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

q

r0 r1 r1

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)

27





q

r0 r1 r1

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)

r3 r3r2 r2

28





S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)

q

r0 r1 r1

r3 r3r2 r2

29





S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)

q

r0 r1 r1

r3 r3r2 r2

Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)

30

Stepping up to XML (WWW03)

Goals: Build on XQuery and XML (extended with RDF-style identity,

following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types

Directional mapping language based on templates<output> {: $var IN document(“doc”)/path WHERE condition :}

<tag>$var</tag></output>

Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion

Can map XML-XML, XML-RDF, RDF-XML (at data level)

31

Mapping Example between XML Schemas

Target:pubs

book* title

author*

name

Source:authors

author* full-

name publication*

title pub-type

pub-type name

publication authorwrittenBy

title

32

Example Piazza Mapping

<pubs><book piazza:id={$t}>{: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t < ‘B’ :}

<title>{$t}</title>

<author><name>{$an}</name></author></book>

</pubs>

33

Challenges

Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to

map from values to tags, … Can only do ~ the XML equivalent of

conjunctive queries

See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details

34

What about Values?

Thus far, we’ve focused on schema mappings

Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc.

The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each

of which can compute transitive closures separately

35

Assessment: The Semantic Web

The KB world focuses on expressively capturing concepts

The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)

Do either of these seem likely to change the world?

What barriers need to be removed?

Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

Documents

d slide

d i d i

d equivalent

d i inconsistency

description d consistent

y slide

x sony

takers grads course