1 Berendt: Advanced databases, 1st semester 2010/2011, berendt/teaching/ 1 Advanced databases – Data and inference (II): Deduction.

1Berendt: Advanced databases, 1st semester 2010/2011, http://www.cs.kuleuven.be/~berendt/teaching/

1

Advanced databases –

Data and inference (II): Deduction and inference on Semantic Web / Linked Data

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/

Last update: 10 November 2010 (2)


2

Agenda

Motivation: Questions to be asked of (e.g.) FOAF

Deductive DBs; specifically: Recursion

Basics of semantic reasoners

A short introduction to Description Logics (DL)

Examples of semantic reasoners


3

Recall: deductive and inductive inference

All swans are white

Or

What can one infer from a social network – examples using FOAF


4

Note: This is moving up a layer in The Semantic Web layer cake

(mostly description logics)

(is then done with logic)


5Deductive (1): Relational-database-like(not really called inference in the Semantic-Web sense because not a result of the application of computerised logic, but a “deductive inference involving a human“)

Who are Paula‘s friends?

What are the topics of all the documents Paula is interested in?

- compiled (by a human) into queries on a (fictitious) relational schema:

SELECT p2 FROM knows

WHERE p1 LIKE “Paula“

SELECT hastopic.resource FROM interest, hastopic

WHERE interest.agent LIKE “Paula“

AND interest.document = hastopic.document


6

Deductive (2): Recursion on relational data („deductive databases“)

Return all people that Paula knows directly or indirectly!

- compiled (by a human) into a recursive query on a (fictitious) relational schema:

Union of

p2 s.t. knows(“Paula“,p2)

p3 s.t. knows(p2,p3)

– and this join performed repeatedly …

(SQL syntax

to follow)


7

Deductive (3): Logical inference (using RDF, RDFS, OWL)

Return all information on Paula!

Paula may be described by different foaf:Person instances in different Semantic Web documents (or Linked Data repositories)

1. Merge all instances that refer to the same URI

2. Merge all instances with matching inverse functional properties (“same email address same person“)

3. Assert owl:sameAs for these instances

4. Output information (interests, friends, …) of each connected component as a merged personal profile


8

Inductive: performing knowledge discovery (1)

What types of “interest groups“ are there among FOAF users?

1. Formulate a vector space in which each possible rdf:Resource describing a topic is one dimension

2. Each person is described by a 0/1 vector in this space

3. Use a distance measure on these vectors (e.g., cosine distance)

4. Cluster people into groups of similar “interest profiles“, using clustering techniques as discussed in the previous 2 lectures.


9

Inductive: performing knowledge discovery (2)

Which advertising should be shown to Paula?

1. Assume you data-mined a dataset of friend relations and purchases and found the association rule that, with high support and confidence, that people bought what their friends bought.

2. Make the inductive inference that this rule holds in general.

3. Find Paula‘s friends: everyone she knows (cf. Deductive (1)), maybe everybody she knows under different names (cf. Deductive (3))

4. Find everything these people bought (via identity merging of profiles and some e-Commerce site‘s client database), what they like (Facebook-style), etc.

5. Show Paula these products


10

What about the horses?


11

Agenda







12

Deductive databases in a Computer Science context

Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets.

Deductive databases are more expressive than relational databases but less expressive than logic programming systems.

Deductive databases have not found widespread adoptions outside academia, but some of their concepts are used in today‘s relational databases to support the advanced features of more recent SQL standards (≥ SQL:1999).


13

Datalog

a query and rule language for deductive databases that syntactically is a subset of Prolog.

Roots in 1970s; the term Datalog was coined in the mid 1980s by a group of researchers interested in database theory.

Query evaluation is sound and complete and can be done efficiently even for large databases.

Query evaluation is usually done using bottom up strategies.

In contrast to Prolog, Datalog

disallows complex terms as arguments of predicates, e.g. P(1, 2) is admissible but not P(f1(1), 2),

imposes certain stratification restrictions on the use of negation and recursion, and

only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.


14

Deductive database languages / Datalog: Motivation

SQL-92 cannot express some queries:

Are we running low on any parts needed to build a ZX600 sports car?

What is the total component and assembly cost to build a ZX600 at today's part prices?

Can we extend the query language to cover such queries?

Yes, by adding recursion.


15

Datalog

SQL queries can be read as follows: “If some tuples exist in the From tables that satisfy the Where conditions, then the Select tuple is in the answer.”

Datalog is a query language that has the same if-then flavor:

New: The answer table can appear in the From clause, i.e., be defined recursively.

Prolog style syntax is commonly used.


16

Example

Find the components of a trike?

We can write a relational algebra query to compute the answer on the given instance of Assembly.

But there is no R.A. (or SQL-92) query that computes the answer on all Assembly instances.

trike wheel 3

trike frame 1

frame seat 1

frame pedal 1

wheel spoke 2

wheel tire 1

tire rim 1

tire tube 1

Assembly instancep

art

su

bp

art

nu

mb

er

trike

wheel frame

spoke tire seat pedal

rim tube

3 1

2 1 1 1

1 1


17

The Problem with Relational Algebra and SQL-92

Intuitively, we must join Assembly with itself to deduce that trike contains spoke and tire.

Takes us one level down Assembly hierarchy.

To find components that are one level deeper (e.g., rim), need another join.

To find all components, need as many joins as there are levels in the given instance!

For any relational algebra expression, we can create an Assembly instance for which some answers are not computed by including more levels than the number of joins in the expression!


18

A Datalog Query that Does the Job

Comp(Part, Subpt) :- Assembly(Part, Subpt, Qty).Comp(Part, Subpt) :- Assembly(Part, Part2, Qty),

Comp(Part2, Subpt).

Can read the second rule as follows:“For all values of Part, Subpt and Qty, if there is a tuple (Part, Part2, Qty) in Assembly and a tuple (Part2, Subpt) in Comp, then there must be a tuple (Part, Subpt) in Comp.”

head of rule body of ruleimplication


19

Using a Rule to Deduce New Tuples

Each rule is a template: by assigning constants to the variables in such a way that each body “literal” is a tuple in the corresponding relation, we identify a tuple that must be in the head relation.

By setting Part=trike, Subpt=wheel, Qty=3 in the first rule, we can deduce that the tuple <trike,wheel> is in the relation Comp.

This is called an inference using the rule.

Given a set of tuples, we apply the rule by making all possible inferences with these tuples in the body.


20

Example

For any instance of Assembly, we can compute all Comp tuples by repeatedly applying the two rules. (Actually, we can apply Rule 1 just once, then apply Rule 2 repeatedly.)

trike spoke

trike tire

trike seat

trike pedal

wheel rim

wheel tube

trike spoke

trike tire

trike seat

trike pedal

wheel rim

wheel tube

trike rim

trike tube

Comp tuples got by applying Rule 2 twice

Comp tuples got by applying Rule 2 once


21

Datalog vs. SQL:1999 (SQL3) notation

A collection of Datalog rules can be rewritten in SQL syntax, if recursion is allowed (this is the case in SQL:1999).

WITH RECURSIVE Comp(Part, Subpt) AS(SELECT A1.Part, A1.Subpt FROM Assembly A1)UNION(SELECT A2.Part, C1.Subpt FROM Assembly A2, Comp C1 WHERE A2.Subpt=C1.Part)

SELECT * FROM Comp(available in Oracle,

SQL Server, PostGres, DB2; not in MySQL)


22

Agenda







23

What is a semantic reasoner?

Inference engine: a finite state machine with a cycle consisting of 3 action states:

Match rules

Select rules

Execute rules

(Semantic) reasoner: an inference engine with a richer set of mechanisms to work with

A (theorem) prover = logical inference engine

Can draw conclusions from logical axioms and statements

Checks the consistency of ontologies

Evaluates logical derivation rules, thereby creating information that was not explicit in the knowledge base


24

What can a reasoner on the Semantic Web draw on?


25

Agenda







26

Why DL?

OWL comes in three flavours

OWL Lite

OWL DL

OWL Full

So the „OWL we think of normally“ is based on description logics, and these are therefore an important type of inference basis for the Semantic Web

Note: Things are somewhat different for Linked Data (inasmuch as it consists of RDF/RDFS statements) – see below.


27

Basics of reasoners

DL is a first-order predicate logic

DL uses concepts, roles and individuals to describe domains

Syntax:

Concepts / classes

Roles

Constructors / operators: to define new concepts or roles from existing ones


28

Some notation


29

Example

Concepts / classes:

Person, Child

Roles / properties:

hasChild(Person,Child)

Constructors / operators:

Parent: a person who has a child

Parent ≡ Person ∏ Ǝ hasChild.Child


30

Knowledge base

Knowledge base (KB) = T-Box + A-Box ( + R-Box )

Tbox: general statements and axioms about concepts and roles

Abox: statements about individual instances of concepts

A reasoner can draw logical conclusions from a knowledge base or check it for consistency.


31

Two examplesand the “layers“ they use


32

Example of an inference

Tbox:

A mother is a parent. Mother Parent

Abox:

Anna is a mother. Anna : Mother

Conclusion:

Anna is a parent Anna : Parent

∏––


33

Example of an inconcistency that can be detected by a reasoner

Tbox:

A mother is a parent. Mother Parent

A mother is not a father. Mother ¬ Father

Abox:

Jo is a mother. Jo : Mother

Jo is a father. Jo : Father

∏––

∏––


34

How does it work? Using Standard DL Techniques

Key reasoning tasks reducible to KB (un)satisfiability

State of the art DL systems typically use (highly optimised) tableaux algorithms to decide satisfiability (consistency) of KB

Tableaux algorithms work by trying to construct a concrete example (model) consistent with KB axioms:

Start from ground facts (ABox axioms)

Explicate structure implied by complex concepts and TBox axioms

Syntactic decomposition using tableaux expansion rules

Infer constraints on (elements of) model


35

Tableaux Reasoning (1)

E.g., KB:

{HappyParent ´ Person u 8hasChild.(Doctor t 9hasChild.Doctor),

John:HappyParent, John hasChild Mary, Mary:: Doctor

Wendy hasChild Mary, Wendy marriedTo John}

Basic steps:

Model the Abox statements

Apply the Tbox axioms

(Find unsatisfiability)


36


KB:

{HappyParent ´ Person u 8hasChild.(Doctor t 9hasChild.Doctor),

John:HappyParent, John hasChild Mary, Mary:: Doctor

Wendy hasChild Mary, Wendy marriedTo John}

Person8hasChild.(Doctor t 9hasChild.Doctor)


37


Tableau rules correspond to constructors in logic (u, 9 etc) E.g., John:(Person u Doctor) --! John:Person and John:Doctor

Stop when no more rules applicable or clash occurs Clash is an obvious contradiction, e.g., A(x), :A(x)

Some rules are nondeterministic (e.g., t, 6) In practice, this means search

Cycle check (blocking) often needed to ensure termination

E.g., KB:

{Person v 9hasParent.Person,

John:Person}


38


In general, (representation of) model consists of:

Named individuals forming arbitrary directed graph

Trees of anonymous individuals rooted in named individuals


39

Decision Procedures

Algorithms are decision procedures, i.e., KB is satisfiable iff rules can be applied such that fully expanded clash free graph is constructed:

Sound

Given a fully expanded and clash-free graph, we can trivially construct a model

Complete

Given a model, we can use it to guide application of non-deterministic rules in such a way as to construct a clash-free graph

Terminating

Bounds on number of named individuals, out-degree of trees (rule applications per node), and depth of trees (blocking)

Crucially depends on (some form of) tree model property


40

Agenda







41

Pellet

Pellet is an OWL DL reasoner based on the tableaux algorithms developed for expressive description logics. (now OWL2)

It supports the full expressivity OWL DL including reasoning about nominals (enumerated classes). Therefore, OWL constructs owl:oneOf and owl:hasValue can be used freely.

Pellet ensures soundness and completeness by incorporating the recently developed decision procedure for SHOIQ (the expressivity of OWL-DL plus qualified cardinality restrictions in DL terminology).

Implemented in Java, commercial and free/open licences

APIs for OWL and Jena

Protégé interface

http://clarkparsia.com/pellet


42

Pellet architecture

1. OWL ontology is converted, via RDF triples, into KB statements

2. Axioms about classes are stored in the Tbox, statements about individuals in the Abox

3. OWL Full documents: Pellet tries to convert them to OWL DL

4. The Tableaux Reasoner uses standard Tableaux reasoning and additional optimisations


43

Pellet Command-Line Interface


44

A note: OWL2 vs. OWL1(from the W3C recommendation: http://www.w3.org/TR/owl2-overview/)

OWL 2 has a very similar overall structure to OWL 1. Looking at Figure 1, almost all the building blocks of OWL 2 were present in OWL 1, albeit possibly under different names.

The central role of RDF/XML, the role of other syntaxes, and the relationships between the Direct and RDF-Based semantics (i.e., the correspondence theorem) have not changed. More importantly, backwards compatibility with OWL 1 is, to all intents and purposes, complete: all OWL 1 Ontologies remain valid OWL 2 Ontologies, with identical inferences in all practical cases (see Section 4.2 of OWL 2 New Features and Rationale [OWL 2 New Features and Rationale]).

OWL 2 adds new functionality with respect to OWL 1. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including:

keys; property chains; richer datatypes, data ranges; qualified cardinality restrictions; asymmetric, reflexive, and disjoint properties; and enhanced annotation capabilities

OWL 2 also defines three new profiles [OWL 2 Profiles] and a new syntax [OWL 2 Manchester Syntax]. In addition, some of the restrictions applicable to OWL DL have been relaxed; as a result, the set of RDF Graphs that can be handled by Description Logics reasoners is slightly larger in OWL 2.


45

KAON2

Infrastructure for OWL-DL and other ontologies Different approach: reduce a KB to a disjunctive Datalog program Implemented in Java, commercial and free/open licences Sound and complete reasoning Optimized for large Aboxes; less comprehensive Tbox services Features:

An API for programmatic management of OWL-DL, SWRL, and F-Logic ontologies,

A stand-alone server providing access to ontologies in a distributed manner using RMI,

An inference engine for answering conjunctive queries (expressed using SPARQL syntax),

A DIG interface, allowing access from tools such as Protégé, A module for extracting ontology instances from relational databases.

http://kaon2.semanticweb.org/


46

Many other reasoners (1)

Pellet Hoolet


47

Many other reasoners (2)

Reasoner


48

Many other reasoners (2 – contd.)

Reasoner


49

Semantic Web Client Library

represents the complete Semantic Web as a single RDF graph.

The library enables applications to query this global graph using SPARQL- and find(SPO) queries.

To answer queries, the library dynamically retrieves information from the Semantic Web by dereferencing HTTP URIs, by following rdfs:seeAlso links, and by querying the Sindice search engine.

The library is written in Java and is based on the Jena framework.

http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/


50

Semantic Web Client Library command-line interface

Ex.: find names and homepages of Richard Cyganiak‘s friends and friends-of-friends

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?friendsname ?friendshomepage ?foafsname ?foafshomepage

WHERE {

{ <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:knows ?friend .

?friend foaf:mbox_sha1sum ?mbox .

?friendsURI foaf:mbox_sha1sum ?mbox .

?friendsURI foaf:name ?friendsname .

?friendsURI foaf:homepage ?friendshomepage . }

OPTIONAL { ?friendsURI foaf:knows ?foaf .

?foaf foaf:name ?foafsname .

?foaf foaf:homepage ?foafshomepage .

}

}


51Semantic Web Client Library: Example of dynamic graph-building (1)


52Semantic Web Client Library: Example of dynamic graph-building (1)


53FOAF: online visualisation(see also http://crschmidt.net/semweb/foafnaut/)


54

Agenda






More on KDD, Web and text mining


55

Discussion item: Horses

Please answer the following question:

Which parts of the „horse-matching problem“ could be addressed in a Semantic Web of Horses, and which couldn‘t?

Hint: Think of names, UELNs, fathers+mothers


56

Your feedback please!

Please give short feedback (anonymous, to be collected 17 Nov):

1. Does the application domain „scientific publication databases and their analysis“ make sense to YOU? If yes, for which use cases?

2. Was Wolfgang Glänzel‘s talk sufficiently prepared by the lecture on 2 Nov? If not, what were you lacking for understanding?

3. What was your take-home message of each of the two talks?

4. Would you like to work in an IBM Extreme Blue project next year?

5. Any other comments on the invited lectures? (combination? Understandability? …)

Thanks for your help!


57

References / background reading

Ding, L., Kolari, P., Ding, Z., Avancha, S., Finin, T., & Joshi, A. (2005). Using Ontologies in the Semantic Web: A Survey. Dept. Of Computer Science and Technical Engineering, University of Maryland, Baltimor MD. Technical Report. TR CS-05-07. http://ebiquity.umbc.edu/_file_directory_/papers/209.pdf

Singh, S., & Karwayun, R. (2010). A comparative study of inference engines. In: Information Technology: New Generations, Third International Conference on, pp. 53-57, 2010 Seventh International Conference on Information Technology, 2010 (pp. 53-57). Los Alamitos, CA: IEEE Computer Society. http://doi.ieeecomputersociety.org/10.1109/ITNG.2010.198

Anne Augustin, A., Kranz, M., & Schäfermeier, R. (2007) Seminar Moderne Webtechnologien – Semantic Web: Reasoners und Frameworks. http://www.ag-nbi.de/lehre/07/S_MWT/Material/05_ReasonersFrameworks.pdf

Some slides in the SW part have been taken or adapted from these or from Horrocks, I. (2006?) OWL: A Description Logic Based

Ontology Language. http://www.cs.man.ac.uk/~horrocks/Slides/cisa06.ppt (see in detail the Powerpoint comment field)

The deductive databases part was taken from (with minor modifications) http://en.wikipedia.org/wiki/Datalog Ramakrishnan, R. & Gehrke, J. (2002?). Database Management Systems, 3rd

Edition 2002. Instructor Slides. Ch. 25 - Deductive Databases. http://pages.cs.wisc.edu/~dbbook/openAccess/thirdEdition/slides/slides3ed-english/Ch25_DedDB-95.pdf

1 Berendt: Advanced databases, 1st semester 2010/2011, berendt/teaching/ 1 Advanced databases – Data and inference (II): Deduction.

Documents