Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm Ontologies and the Semantic Web Marina San(ni [email protected]fil.uu.se Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden Spring 2016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm
Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden
Spring 2016
Acknowledgements
• Most slides based on Harrocks (2008).
The Seman(c Web & Ontologies 2
Outline
• The Seman(c Web
• Ontologies
The Seman(c Web & Ontologies 3
Chronology hNp://en.wikipedia.org/wiki/
History_of_the_World_Wide_Web • On August 6, 1991,Berners-‐Lee posted a short summary of the World Wide
Web project on the alt.hypertext newsgroup, invi(ng collaborators. This date also marked the debut of the Web as a publicly available service on the Internet, although new users could only access it aEer August 23.
• Beginning in 2002, new ideas for sharing and exchanging content ad hoc, such as Weblogs and RSS, rapidly gained acceptance on the Web. This new model for informa(on exchange, primarily featuring user-‐generated and user-‐edited websites, was dubbed Web 2.0.
• Popularized by Berners-‐Lee's book Weaving the Web (2000) and a Scien(fic American ar(cle by Berners-‐Lee, James Hendler, and Ora Lassila, the term
• Seman&c Web describes an evolu&on of the exis&ng Web in which the network of hyperlinked human-‐readable web pages is extended by machine-‐readable metadata about documents and how they are related to each other, enabling automated agents to access the Web more intelligently and perform tasks on behalf of users.
• In 2006, Berners-‐Lee and colleagues stated that the idea "remains largely unrealized"
The Seman(c Web & Ontologies 4
Web 1.0
• Web 1.0 is a retronym referring to an early stage of the World Wide Web's evolu(on.
• Some design elements of a Web 1.0 site include:
– Personal web pages were common, consis(ng mainly of sta(c pages
– Sta(c pages instead of dynamic HTML. – The use of HTML 3.2-‐era elements such as Framing (World Wide Web)s and tables to posi(on and align elements on a page (now we use css and frames are deprecated)
– GIF buNons...
The Seman(c Web & Ontologies 5
Web 2.0 • Web 2.0 describes World Wide Web sites that use technology
beyond the sta(c pages of earlier Web sites. • The key features of Web 2.0 include:
– Tagging -‐ allows users to collec(vely classify and find informa(on (e.g. Tagging)
– Rich User Experience-‐ dynamic content; responsive to user input – User Par(cipa(on -‐ informa(on flows two ways between site owner and site user by means of evalua(on, review, and commen(ng.
– Site users add content for others to see – Mass Par(cipa(on -‐ Universal web access leads to differen(a(on of concerns from the tradi(onal internet userbase.
– etc.
The Seman(c Web & Ontologies 6
Web 3.0
• “Web 3.0, a phrase coined by John Markoff of the New York Times in 2006, refers to a supposed third genera(on of Internet-‐based services that collec(vely comprise what might be called ‘the intelligent Web’ — such as those using seman(c web, microformats, natural language search, data-‐mining, machine learning, recommenda(on agents, and ar(ficial intelligence technologies — which emphasize machine-‐facilitated understanding of informa(on in order to provide a more produc(ve and intui(ve user experience.”
• Web 3.0 will be more connected, open, and intelligent, with seman(c Web technologies, distributed databases, natural language processing, machine learning, machine reasoning, and autonomous agents. – hNp://lifeboat.com/ex/web.3.0
The Seman(c Web & Ontologies 7
This has yet to happen.
• "The Web was designed as an informa$on space, with the goal that it should be useful not only for human-‐human communica(on, but also that machines would be able to par(cipate and help.
• One of the major obstacles to this has been the fact that most informa$on on the Web is designed for human consump$on, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the Web.
• Leaving aside the ar(ficial intelligence problem of training machines to behave like people, the Seman$c Web approach instead develops languages for expressing informa$on in a machine process-‐able form"-‐
– Tim Berners-‐Lee, The Seman&c Web Roadmap, 1998 – hNp://www.w3.org/DesignIssues/Seman(c.html
The Seman(c Web & Ontologies 8
The web: present and future
Today…
• The web is rela(vely simple: – Hypertexts and hypermedia – Access is engineered via a combina(on of keyword-‐based search and link nagiva(on.
This simplicity has been one of the great strengths of the web, and has been an important factor in its popularity and their own content.
The Seman(c Web & Ontologies 9
Shortcomings
Examples: • Finding informa(on about people with very common names can be a frustra(ng experience.
• Answering more complex queries along with more general informa(on retrieval, integra(on, sharing and processing can be difficult …. We have seen that…
The Seman(c Web & Ontologies 10
Some solu(ons • Sosware glue: Mashups
– loca(on informa(on from one source might be combined with map informa(on from another source in order to show the loca(on of and provide direc(ons to points of interest such as hotels and restaurants.
• Tagging via social networks (Web 2.0) – harness the power of user communi(es in order to share and annotate informa(on.
• Examples include image and video shar-‐ing sites such as Flickr and YouTube, and auc(on sites such as eBay.
– In these applica(ons, annota(ons usually take the form simple tags, such as ”each", ”birthday", ”family" and ”friends". The meaning of tags is, however, typically not well defined, and may be impenetrable even to human users: typ-‐ical examples (from Flickr) include "asquatchmusicfes(val", "elebritylookalikes", and "wab08".
The Seman(c Web & Ontologies 11
The ”travel agent”
• The classic example of a seman(c web applica(on is an automated travel agent that, given various constraints and preferences, would offer the user suitable travel or vaca(on sugges(ons.
• A key feature of such a "sosware agent" is that it would not simply exploit a predetermined set of informa(on sources, but would search the web for relevant informa(on in much the same way that a human user might do when planning a vaca(on.
The Seman(c Web & Ontologies 12
The goal
• The goal of the Seman(c Web is to allow web informa(on and services to be more effec(vely exploited by humans and automated tools.
The Seman(c Web & Ontologies 13
Seman(c Web • The focus of the seman(c web is to share data instead of documents.
• In other words, it is a project that should provide a common framework that allows data to be shared and reused across applica(on, enterprise, and community boundaries.
• It is a collabora(ve effort led by World Wide Web Consor(um (W3C).
The Seman(c Web & Ontologies 14
Semantic Web & Ontologies • How are we going to represent meaning and knowledge on the web?
• A key idea behind the seman&c web is to address this problem by giving machine-‐accessible seman&cs via annota&on.
• Knowledge is represented in the form of rich conceptual schemas called ontologies.
• Ontologies are the backbone of the Seman(c Web.
• Ontologies are rich conceptual schemas that give formally defined meanings to the terms used in annota&ons, transforming them into seman&c annota&ons.
• They provide the knowledge that is required for seman(c applica(ons of all kinds. 15 The Seman(c Web & Ontologies
Main Difficulty
• Current web content is intended for humans (HTML markup with layout, images and other presenta(onal features).
• Humans understand this content, but machines can’t.
The Seman(c Web & Ontologies 16
Basically... • Ontologies provide a shared understanding of a domain.
• They provide background knowledge to automatize certain tasks.
• By the process of annotation, knowledge can be linked to ontologies. – Example: “Angelina Jolie” (Text) linked to concept Actress – In our ontology we also know that an actress always is female and a
person.
• Ontologies allow the creation of annotations à machine-readable and machine-understandable content.
• If machines can understand content, they can also perform more meaningful and intelligent queries. – Distinction of Jaguar the animal and the car. – Combination of information that is distributed on the Web.
17 The Seman(c Web & Ontologies
Old and New Issues Old ones: • knowledge representa(on • Reasoning • Harnessing the idiosyncracies of natural languages • …
New ones: • integra(ng different ontologies may prove to be at least as
hard as integra(ng the resources that they describe • Crea(on of suitable annota(ons • …
The Seman(c Web & Ontologies 18
Regardless these issues…
• … considerable progress has been made in the development of the infrastructure needed to support the seman(c web.
• In par(cular, there has been impressive progress in the development of languages and tools for content annota(on and for the design and deployment of ontologies.
The Seman(c Web & Ontologies 19
Seman(c Annota(on
• To facilitate the process of seman(c annota(on, RDF and OWL have been developed as standard formats fo the sharing and integra(on of data and knowledge.
• RDF and OWL are standards: – RDF (Resource Descrip(on Framework) – OWL (Web Ontology Language)
The Seman(c Web & Ontologies 20
Ontologies (Metaphysics)
• Ontology, in its original philosophical sense, is a fundamental branch of metaphysics focusing on the study of existence.
• Its objec(ve is to determine what en((es and types of en((es actually exist, and thus to study the structure of the world.
• The study of ontology can be traced back to the work of Plato and Aristotle, and includes the development of hierarchical categorisa(ons of different kinds of en((es and the features that dis(nguish them
The Seman(c Web & Ontologies 21
Tree of Porphyry
Tree of Porphyry, III AD
• The Porphyrian tree, Tree of Porphyry or Arbor Porphyriana is a classic device for illustra(ng what is also called a "scale of being". It was suggested by the 3rd century AD Greek neoplatonist philosopher and logician Porphyry
The Seman(c Web & Ontologies 22
Ontology (Computer Science, AI, LT, IR…)
• Engineering artefact, usually a model of some aspect of the world.
• It introduces vocabulary describing various aspects of the domain being modelled, and provides an explicit specifica(on of the intended meaning of the vocabulary.
• This specifica(on osen includes classifica(on-‐based informa(on, not unlike that in Porphyry's tree.
The Seman(c Web & Ontologies 23
What is an ontology (i)?
24
“An ontology is a formal, explicit specifica&on of a shared conceptualiza&on”
Studer, Benjamins, Fensel. Knowledge Engineering: Principles and Methods. Data and Knowledge Engineering. 25 (1998) 161-‐197
An ontology is an explicit specification of a conceptualization Gruber, T. A translation Approach to portable ontology specifications. Knowledge Acquisition. Vol. 5. 1993. 199-220
Abstract model and simplified view of some phenomenon in the world that we want to represent
Machine-readable
Concepts, properties relations, functions, constraints, axioms, are explicitly defined
Consensual Knowledge
The Seman(c Web & Ontologies
What is an ontology (ii)? • An ontology is a hierarchically structured set of terms for describing a
domain that can be used as a skeletal foundation for a knowledge base
B. Swartout; R. Patil; k. Knight; T. Russ. Toward Distributed Use of Large-Scale Ontologies Ontological Engineering. AAAI-97 Spring Symposium Series. 1997. 138-148
• An ontology defines the basic terms and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary
• An ontology provides the means for describing explicitly the conceptualization behind the knowledge represented in a knowledge base
A. Bernaras;I. Laresgoiti; J. Correra. Building and Reusing Ontologies for Electrical Network Applications ECAI96. 12th European conference on Artificial Intelligence. Ed. John Wiley & Sons, Ltd.
298-302
25 The Seman(c Web & Ontologies
Examples • Top level ontology: Standard Upper Ontology
– In informa(on science, an upper ontology (also known as a top-‐level ontology or founda(on ontology) is an ontology (in the sense used in informa(on science) which describes very general concepts that are the same across all knowledge domains.
CHEMICALS, UMLS • Research ontology: KA2 (Knowledge Acquisi(on
Community Ontology)
The Seman(c Web & Ontologies 26
Resource Descrip(on Framework (i)
• A language that has been developed in order to provide a extensible mechanism for describing web resources and rela(onships between them.
• A key feature of RDF is the use of Interna(onalized Resource Iden(fiers (IRIs) (which is a generalisa(on of Uniform Resource Locators (URLs) to refer to resources.
• RDF is a very simple language: its underlying data structure is a labelled directed graph, and its only syntac(c construct is the triple.
• A triple consists of three components, referred to as the subject, predicate and object.
The Seman(c Web & Ontologies 27
a directed graph is a set of nodes connected by edges, where the edges have a direc(on associated with them.
/ˈaɪˌɑːˌraɪ/
RDF (ii) • More formally, a triple represents a single edge (labelled
with the predicate) connec(ng two nodes (labelled with the subject and object); it describes a binary rela(onship between the subject and object via the predicate.
• The predicate of a triple is always an IRI, and an IRI that is used in the predicate posi(on of a triple is called a property.
• A set of triples is called an RDF graph.
• In order to facilitate the sharing and exchanging of graphs on the web, an XML serialisa(on has also been defined.
The Seman(c Web & Ontologies 28
”Harry PoNer has a pet called Hedwig…”
The Seman(c Web & Ontologies 29
RDF/XML
RDF graph
Lect 09: Rela(on Extrac(on: DBPediaRela(on database that draw from Wikipedia
• Resource Descrip&on Framework (RDF) triples subject predicate object Golden Gate Park location San Francisco!dbpedia:Golden_Gate_Park dbpedia-‐owl:loca(on dbpedia:San_Francisco !
• DBPedia: The DBpedia project uses the Resource Descrip(on Framework (RDF) to represent the extracted informa(on and consists of 3 billion RDF triples, 580 million extracted from the English edi(on of Wikipedia and 2.46 billion from other language edi(ons (wikipedia, March 2016).
30 The Seman(c Web & Ontologies
… but … not enough…
• Capabili(es of RDF as ontology language are limited – No cardinality – No possible to describe conjunc(on of classes – …
RDF is a very simple language
The Seman(c Web & Ontologies 31
cardinality of a set is a measure of the "number of elements of the set”. For example, the set A = {2, 4, 6} contains 3 elements, and therefore A has a cardinality of 3
Need for a more expressive ontology language: OWL (Web Ontology Language)
• Since the architecture of the web depends on agreed standards, the World Wide Web Consor(um (W3C) set up a standardisa(on working group to develop a standard for a web ontology language.
• The result of this ac(vity was the OWL ontology language standard.
• The integra(on of OWL with RDF has the advantage of making OWL ontologies directly accessible to web based applica(ons.
The Seman(c Web & Ontologies 32
Back Story: hNp://ileriseviye.wordpress.com/2011/11/01/why-‐web-‐ontology-‐language-‐is-‐abbreviated-‐as-‐owl-‐and-‐not-‐wol/
The Seman(c Web & Ontologies 33
Descrip(on Logics (DLs)
• A key feature of OWL is its basis in Descrip(on Logics, a family of logic-‐based knowledge representa(on formalisms that have a formal seman(cs based on first-‐order logic (FOL).
The Seman(c Web & Ontologies 34
Descrip(on Logics • We can use DLs to model an applica(on domain. The focus is then on: – Representa(on of knowledge about categories – The set of categories in an applica(on domain is called terminology
– The terminology is arranged in a hierachical organiza&on called ontology, which capture superset & subset rela(ons among categoires/concepts.
– In order to specify a hierachical structure, we can use subsump$on rela(ons betw the appropriate concepts in a terminiology
– Subsump$on is a form of inference. Determines whether a superset/subset rela(on (based on the fact asserted in a terminology) exists betw two concepts.
The Seman(c Web & Ontologies 35
In short, DLs are… • … formalisms based on an object-‐oriented modelling, in which the domain is described in terms of individuals (instances), concepts (classes), and roles (proper(es/predicates):
– individuals, e.g., "Hedwig", are the basic elements of the domain;
– concepts, e.g., "Owl", describe sets of individuals having similar characteris(cs;
– roles, e.g., "hasPet", describe rela(onships between pairs of individuals, such as "HarryPoNer hasPet Hedwig".
The Seman(c Web & Ontologies 36
Axioms • An OWL ontology consists of a set of axioms
• Exemple: – given the axiom C equivalentClass D, then an individual is an instance of C if and
only if it is an instance of D. – i.e. Combining axioms with class descrip(ons allows for easy extension of the
vocabulary by introducing new names as abbrevia(ons for descrip(ons.
See the following axiom: Class: HogwartsStudent!
!EquivalentTo: Student and attendsSchoolvalue Hogwarts! introduces the class name HogwartsStudent, and asserts that its instances are just those Students who aNend Hogwarts.
The Seman(c Web & Ontologies 37
TBox & ABox
• Axioms describe constraints on the structure of the domain: – in DLs such a set of axioms is called a TBox (Terminology Box).
• OWL also allows for axioms asser&ng facts about some concrete situa(on, similar to data in a database se�ng: – in DLs such a set of axioms is called an ABox (Asser(on Box).
The Seman(c Web & Ontologies 38
Decid-‐ability (i)
• Descrip(on Logics are fully-‐fledged logics and so have a formal seman(cs.
• DLs can be seen as decidable subsets of FOL with: – individuals being equivalent to constants, – concepts to unary predicates, – roles to binary predicates.
The Seman(c Web & Ontologies 39
FOL … undecidable (some(mes)
• The Incompleteness Theorem , proven in 1930, demonstrates that first-‐order logic is in general undecidable.
• That means there exist statements in this logic form that, under certain condi(ons, cannot be proven either true or false.
• Ex: can’t solve the Hal$ng Problem
The Seman(c Web & Ontologies 40
Hal(ng Problem • In 1936 Alan Turing proved that it's not possible to decide whether
an arbitrary program will eventually halt, or run forever.
• The official defini&on of the problem is to write a program (actually, a Turing Machine*) that accepts as parameters a program and its parameters. That program needs to decide, in finite &me, whether that program will ever halt running these parameters.
• The hal(ng problem is a cornerstone problem in computer science. It is used mainly as a way to prove a given task is impossible, by showing that solving that task will allow one to solve the hal(ng problem.
*A Turing machine is a hypothe(cal device that manipulates symbols according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm,
The Seman(c Web & Ontologies 41
Decid-‐ability (ii)
• DLs give a precise and unambiguous meaning to descrip(ons of the domain
• This also allows for the development of reasoning algorithms that can provide correct answers to arbitrarily complex queries about the domain.
The Seman(c Web & Ontologies 42
Reasoning: OWL vs Databases
OWL axioms behave like inference rules rather than database constraints.
!Class: Phoenix!
!SubClassOf: isPetOf only Wizard!!Individual: Fawkes!
Types: Phoenix!Facts: isPetOf Dumbledore!
• Fawkes is said to be a Phoenix and to be the pet of Dumbledore, and it is also stated that only a Wizard can have a pet Phoenix.
• In OWL, this leads to the implica(on that Dumbledore is a Wizard. That is, if we were to query the ontology for instances of Wizard, then Dumbledore would be part of the answer.
• In a database se�ng the schema could include a similar statement about the Phoenix class, but in this case it would be interpreted as a constraint on the data: adding the fact that Fawkes isPetOf Dumbledore without Dumbledore being already known to be a Wizard would lead to an invalid database state, and such an update would therefore be rejected by a database management system as a constraint viola(on.
The Seman(c Web & Ontologies 43
Ontology Development Tools
• State of the art ontology development tools, such as SWOOP, Protégé, and TopBraid Composer, use DL reasoners to provide feedback to the user about the logical implica(ons of their design: – i.e. warnings about inconsistencies and synonyms.
Domain-‐specific ontologies • The availability of tools has contributed to the increasingly widespread use of OWL, and it has become the de facto standard for ontology development in fields as diverse as – Biology – Medicine – Geography – Geology – Agriculture – Defence – etc
The Seman(c Web & Ontologies 47
Complex Queries • The use of DL reasoners allows OWL ontology applica(ons to answer complex queries and to provide guarantees about the correctness of the result.
• Reliability and correctness are clearly important features of any informa(on system;
• They are par(cularly important if ontology based systems are to be used in safety-‐cri(cal applica(ons such as medicine, where incorrect reasoning could adversely impact pa(ent care.
The Seman(c Web & Ontologies 48
Standard Query Language
• It has long been recognised that the seman(c web, and seman(c web knowledge representa(on languages such as RDF and OWL, would also benefit from the availability of a standardised query language such as SQL
• A W3C standardisa(on working group was set up, and has completed its work on the SPARQL query language standard.
The Seman(c Web & Ontologies 49
SPARQL Protocol and RDF Query Language …
• … is an RDF query language, ie a query language that can retrieve and manipulate data stored in RDF format (ie triples).
• SPARQL allows for a query to consist of triple paSerns, conjunc(ons, disjunc(ons, and op(onal paNerns
The Seman(c Web & Ontologies 50
Tags & Ontologies
• Tagging facili(es within Web 2.0 applica(ons have shown how it might be possible for user communi(es to collabora(vely annotate web content, and create simple forms of ontology via the development of hierarchically organised sets of tags, osen called folksonomies….
The Seman(c Web & Ontologies 51
Challenges
• Currently hard to combine: – Increased expressive power (by using more sophis(cated logics) with scalability (large ontologies)
acquisi(on) is the automa(c or semi-‐automa(c crea(on of ontologies, including extrac(ng the corresponding domain's terms and the rela&onships between those concepts from a corpus of natural language text, and encoding them with an ontology language for easy retrieval.
• As building ontologies manually is extremely labor-‐intensive and (me consuming, there is great mo(va(on to automate the process.
• Typically, the process starts by extrac(ng terms and concepts or noun phrases from plain text using linguis(c processors such as part-‐of-‐speech tagging and phrase chunking. Then sta(s(cal techniques are used to extract rela(on, osen based on Machine Learning. – hNp://en.wikipedia.org/wiki/Ontology_learning
The Seman(c Web & Ontologies 53
In summary…
Why to build an ontology? • To share common understanding of the structure of informa(on among people or sosware agents • To enable reuse of domain knowledge • To make domain assump(ons explicit • To analyze domain knowledge
The Seman(c Web & Ontologies 54
How to build an ontology
Generally speaking (and roughtly said), when designing an ontology, four main components are used: 1. Classes 2. Rela(ons 3. Axioms 4. Instances The Seman(c Web & Ontologies 55
Classes
• concepts of the domain or tasks, which are usually organized in taxonomies Ex: in a university ontology, student and professor are two classes
The Seman(c Web & Ontologies 56
Rela(ons
A type of interac(on between concepts of the domain: Ex: subclass-‐of or is-‐a are rela(ons
The Seman(c Web & Ontologies 57
Axioms
Asser(ons that are always true for the domain of interest Ex: if a student aNends both ”Math” and ”Basic text processing” courses, then he or she must be a 1st year student.
The Seman(c Web & Ontologies 58
Instances
Represent specific elements Ex: a Student called Peter is the instance of Student class
The Seman(c Web & Ontologies 59
Important!
• There is no single correct class hierarchy for any given domain.
• The hierarchy depends on the possible uses of the ontology.
• The level of detail is depend on the applica(ons and purposes.