Berendt: Advanced databases, 1st semester 2010/2011, http://www.cs.kuleuven.be/~berendt/teaching/ 1 Advanced databases – Data and inference (II): Deduction and inference on Semantic Web / Linked Data Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.be/~berendt/teaching/ ast update: 10 November 2010 (2)
57
Embed
1 Berendt: Advanced databases, 1st semester 2010/2011, berendt/teaching/ 1 Advanced databases – Data and inference (II): Deduction.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
5Deductive (1): Relational-database-like(not really called inference in the Semantic-Web sense because not a result of the application of computerised logic, but a “deductive inference involving a human“)
Who are Paula‘s friends?
What are the topics of all the documents Paula is interested in?
- compiled (by a human) into queries on a (fictitious) relational schema:
1. Assume you data-mined a dataset of friend relations and purchases and found the association rule that, with high support and confidence, that people bought what their friends bought.
2. Make the inductive inference that this rule holds in general.
3. Find Paula‘s friends: everyone she knows (cf. Deductive (1)), maybe everybody she knows under different names (cf. Deductive (3))
4. Find everything these people bought (via identity merging of profiles and some e-Commerce site‘s client database), what they like (Facebook-style), etc.
Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets.
Deductive databases are more expressive than relational databases but less expressive than logic programming systems.
Deductive databases have not found widespread adoptions outside academia, but some of their concepts are used in today‘s relational databases to support the advanced features of more recent SQL standards (≥ SQL:1999).
a query and rule language for deductive databases that syntactically is a subset of Prolog.
Roots in 1970s; the term Datalog was coined in the mid 1980s by a group of researchers interested in database theory.
Query evaluation is sound and complete and can be done efficiently even for large databases.
Query evaluation is usually done using bottom up strategies.
In contrast to Prolog, Datalog
disallows complex terms as arguments of predicates, e.g. P(1, 2) is admissible but not P(f1(1), 2),
imposes certain stratification restrictions on the use of negation and recursion, and
only allows range restricted variables, i.e. each variable in the conclusion of a rule must also appear in a not negated clause in the premise of this rule.
SQL queries can be read as follows: “If some tuples exist in the From tables that satisfy the Where conditions, then the Select tuple is in the answer.”
Datalog is a query language that has the same if-then flavor:
New: The answer table can appear in the From clause, i.e., be defined recursively.
Intuitively, we must join Assembly with itself to deduce that trike contains spoke and tire.
Takes us one level down Assembly hierarchy.
To find components that are one level deeper (e.g., rim), need another join.
To find all components, need as many joins as there are levels in the given instance!
For any relational algebra expression, we can create an Assembly instance for which some answers are not computed by including more levels than the number of joins in the expression!
Can read the second rule as follows:“For all values of Part, Subpt and Qty, if there is a tuple (Part, Part2, Qty) in Assembly and a tuple (Part2, Subpt) in Comp, then there must be a tuple (Part, Subpt) in Comp.”
Each rule is a template: by assigning constants to the variables in such a way that each body “literal” is a tuple in the corresponding relation, we identify a tuple that must be in the head relation.
By setting Part=trike, Subpt=wheel, Qty=3 in the first rule, we can deduce that the tuple <trike,wheel> is in the relation Comp.
This is called an inference using the rule.
Given a set of tuples, we apply the rule by making all possible inferences with these tuples in the body.
For any instance of Assembly, we can compute all Comp tuples by repeatedly applying the two rules. (Actually, we can apply Rule 1 just once, then apply Rule 2 repeatedly.)
A collection of Datalog rules can be rewritten in SQL syntax, if recursion is allowed (this is the case in SQL:1999).
WITH RECURSIVE Comp(Part, Subpt) AS(SELECT A1.Part, A1.Subpt FROM Assembly A1)UNION(SELECT A2.Part, C1.Subpt FROM Assembly A2, Comp C1 WHERE A2.Subpt=C1.Part)
Pellet is an OWL DL reasoner based on the tableaux algorithms developed for expressive description logics. (now OWL2)
It supports the full expressivity OWL DL including reasoning about nominals (enumerated classes). Therefore, OWL constructs owl:oneOf and owl:hasValue can be used freely.
Pellet ensures soundness and completeness by incorporating the recently developed decision procedure for SHOIQ (the expressivity of OWL-DL plus qualified cardinality restrictions in DL terminology).
Implemented in Java, commercial and free/open licences
A note: OWL2 vs. OWL1(from the W3C recommendation: http://www.w3.org/TR/owl2-overview/)
OWL 2 has a very similar overall structure to OWL 1. Looking at Figure 1, almost all the building blocks of OWL 2 were present in OWL 1, albeit possibly under different names.
The central role of RDF/XML, the role of other syntaxes, and the relationships between the Direct and RDF-Based semantics (i.e., the correspondence theorem) have not changed. More importantly, backwards compatibility with OWL 1 is, to all intents and purposes, complete: all OWL 1 Ontologies remain valid OWL 2 Ontologies, with identical inferences in all practical cases (see Section 4.2 of OWL 2 New Features and Rationale [OWL 2 New Features and Rationale]).
OWL 2 adds new functionality with respect to OWL 1. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including:
keys; property chains; richer datatypes, data ranges; qualified cardinality restrictions; asymmetric, reflexive, and disjoint properties; and enhanced annotation capabilities
OWL 2 also defines three new profiles [OWL 2 Profiles] and a new syntax [OWL 2 Manchester Syntax]. In addition, some of the restrictions applicable to OWL DL have been relaxed; as a result, the set of RDF Graphs that can be handled by Description Logics reasoners is slightly larger in OWL 2.
Infrastructure for OWL-DL and other ontologies Different approach: reduce a KB to a disjunctive Datalog program Implemented in Java, commercial and free/open licences Sound and complete reasoning Optimized for large Aboxes; less comprehensive Tbox services Features:
An API for programmatic management of OWL-DL, SWRL, and F-Logic ontologies,
A stand-alone server providing access to ontologies in a distributed manner using RMI,
An inference engine for answering conjunctive queries (expressed using SPARQL syntax),
A DIG interface, allowing access from tools such as Protégé, A module for extracting ontology instances from relational databases.
represents the complete Semantic Web as a single RDF graph.
The library enables applications to query this global graph using SPARQL- and find(SPO) queries.
To answer queries, the library dynamically retrieves information from the Semantic Web by dereferencing HTTP URIs, by following rdfs:seeAlso links, and by querying the Sindice search engine.
The library is written in Java and is based on the Jena framework.
Ding, L., Kolari, P., Ding, Z., Avancha, S., Finin, T., & Joshi, A. (2005). Using Ontologies in the Semantic Web: A Survey. Dept. Of Computer Science and Technical Engineering, University of Maryland, Baltimor MD. Technical Report. TR CS-05-07. http://ebiquity.umbc.edu/_file_directory_/papers/209.pdf
Singh, S., & Karwayun, R. (2010). A comparative study of inference engines. In: Information Technology: New Generations, Third International Conference on, pp. 53-57, 2010 Seventh International Conference on Information Technology, 2010 (pp. 53-57). Los Alamitos, CA: IEEE Computer Society. http://doi.ieeecomputersociety.org/10.1109/ITNG.2010.198
Anne Augustin, A., Kranz, M., & Schäfermeier, R. (2007) Seminar Moderne Webtechnologien – Semantic Web: Reasoners und Frameworks. http://www.ag-nbi.de/lehre/07/S_MWT/Material/05_ReasonersFrameworks.pdf
Some slides in the SW part have been taken or adapted from these or from Horrocks, I. (2006?) OWL: A Description Logic Based
Ontology Language. http://www.cs.man.ac.uk/~horrocks/Slides/cisa06.ppt (see in detail the Powerpoint comment field)
The deductive databases part was taken from (with minor modifications) http://en.wikipedia.org/wiki/Datalog Ramakrishnan, R. & Gehrke, J. (2002?). Database Management Systems, 3rd