Improving Automatic Semantic Tag Recommendation through Fuzzy Ontologies
Post on 22-Apr-2015
634 Views
Preview:
DESCRIPTION
Transcript
Improving Automatic Semantic Tag Recommendation through Fuzzy Ontologies
Panos Alexopoulos, Manolis Wallace
7th International Workshop Semantic and Social Media Adaptation and Personalization
Luxembourg, December 3-4, 2012
2
Introduction Problem Definition and Paper
Focus Approach Overview and Rationale
Proposed Framework Tagging Evidence Model Tagging Process
Framework Evaluation Evaluation Process Evaluation Results
Conclusions and Future Work
Agenda
3
Problem Definition
Introduction
●Semantic tagging involves identifying and assigning to texts appropriate entities that reflect what the document actually talks about.
●One important challenge in this task is the correct distinction between the entities that play a central role to the document’s meaning and those that are just complementary to it.
●For example, consider the following text:
● “Annie Hall is a much better movie than Deconstructing Harry, mainly because Alvy Singer is such a well formed character and Diane Keaton gives the performance of her life”.
● The text mentions two films, yet the one it actually talks about is only “Annie Hall”, meaning that only this is an appropriate tag.
Semantic Tagging
4
Problem Definition
Introduction
●A second challenge is the inference of appropriate tags even when these are not explicitly mentioned within the text.
●For example:
● “In June 1863, Colonel James Montgomery commanded a brigade in operations along the coast resembling his earlier Jayhawk raids. The most famous of his controversial operations was the Raid at Combahee Ferry in which 800 slaves were liberated with the help of Harriett Tubman”.
● The text describes a historical battle which took place in Beaufort County, South Carolina.
● This means that this location is an important geographical tag for the text, yet it is not explicitly mentioned within it.
Semantic Tagging
5
Paper Focus
Introduction
● In a previous work we have already proposed a framework for semantic tagging through the exploitation of domain ontologies.
●The ontologies describe the domain(s) of the texts to be tagged and their entities serve as a source of possible tags for them.
●The key idea is that a given ontological entity is more likely to represent the text’s meaning (and thus be an accurate tag) when there are many ontologically related to it entities in the text.
● In this paper we revise and extend the above framework so as to enable it to exploit also fuzzy ontological information.
●Our assumption is that the fuzziness that may characterize some of the ontology’s relations can increase the evidential power of its entities and consequently the effectiveness of the tag recommendation process
6
Approach Overview and Rationale
Proposed Framework
●Our existing framework targets the task of semantic tagging based on the intuition that a given ontological entity is more likely to represent the meaning the text when there are many ontologically related to it entities in the text.
●E.g. in the example text the entities “Alvy Singer” and “Diane Keaton” indicate that the text is about the film “Annie Hall”.
●That is because Alvy Singer is a character of this film and Diane Keaton an actor of it.
●All these entities and the relations between them are derived from one or more domain ontologies.
7
Approach Overview and Rationale
Proposed Framework
●The extension we propose to this framework has to do with considering, where possible, fuzzy relations between entities rather than crisp ones.
●Fuzzy relations allow the assignment of truth degrees to vague ontological relations in an effort to quantify their vagueness.
●E.g. Instead of ‘‘Annie Hall is a comedy” we may say that “Annie Hall is a comedy to a degree of 0.7”
●Similarly, we may say that “Woody Allen is an expert director at human relations to a degree of 0.8”.
●Thus, by using a fuzzy ontology one can represent useful semantic information for the tag recommendation task in a higher level of granularity than with a crisp ontology.
8
Approach Overview and Rationale
Proposed Framework
●For example, in the film domain, instead of having just the relation hasPlayedInFilm it is more useful to have the fuzzy relation wasAnImportantActorInFilm.
● E.g. “Robert Duvall was an important actor in Apocalypse Now to a degree of 0.6”.
●To see why this is useful consider the text “Robert Duvall’s brilliant performance in the film showed that his choice by Francis Ford Copola was wise”.
● If Duvall and Copola have collaborated in more than one film but in only one of them Duval had a major role (as captured by the fuzzy degree of his relation to the film) then this film is more likely to be the subject of this text.
9
Framework Components
Proposed Framework
●Our proposed framework assumes the availability of a fuzzy ontology for the domain of the texts to be tagged and defines two components:
● A Tag Fuzzy Ontological Evidence Model that contains entities that may serve as tag-related evidence for the application scenario and domain at hand.
●Each entity is assigned evidential power degrees which denote its usefulness as evidence for the tag recommendation task.
● A Tag Recommendation Process that uses the evidence model to determine, for a given text, the ontological entities that potentially represent its content.
●A confidence score for each entity is used to denote the most probable tags.
10
Tagging Evidence Model
Proposed Framework
●Defines for each ontology entity which other instances and to what extent should be used as evidence towards the correct determination of the texts’ tags.
●It consists of entity pairs where a particular entity provides quantified evidence for a another one.
11
Evidence Model Construction
Proposed Framework
●Construction of the evidence model depends on the characteristics of the domain and the texts.
●The first step of the construction is manual and involves:
● The identification of the concepts whose instances are expected to be used as tags (e.g. military conflicts, films etc.)
● The determination, for each of these concepts, of the related to them concepts whose instances may serve as tag evidence:
●For example, in texts that review films, some concepts whose instances may act as tag evidence are related directors, actors and characters.
● The identification, for each pair of evidence and target concept, of the fuzzy relation paths that links them.
12
Evidence Model Construction
Proposed Framework
●The result of this first step is a tag evidence concept mapping like the following:
●This mapping is typically small so its manual construction is not difficult.
13
Evidence Model Construction
Proposed Framework
●Based on such mappings, the second step of the construction is automatic and involves the generation of the tag-evidence entity pairs along with a tag evidential strength.
●This strength is: ● Proportional to the fuzzy degree of the relation linking the evidence entity with
the tag.● Inversely proportional to the evidential entity’s own ambiguity as well as to the
number and fuzzy degrees of the other tags it provides evidence for.
●For example, “Woody Allen” provides evidence for the film “Annie Hall” to a strength of 0.02 because he has directed many other films while the character “Alvy Singer” has evidential strength of 1 as it appears only on this film.
14
Tag Recommendation Process
Proposed Framework
●Step 1: We extract from the text the terms that possibly refer to tag entities.
●Step 2: We extract from the text the terms that possibly refer to evidential entities
●Step 3: We consider as candidate tag entities not only those found within the text but practically all those that are related to instances of the evidential concepts in the ontology.
● E.g. If we find the term “Woody Allen” then all his films are candidate tag entities.
●Step 4: Using the evidence model of the previous slide we compute for each candidate tag entity the confidence that it actually represents the text’s meaning.
●Note: The evidence model is assumed to have been calculated offline and stored in an index, so as to make the above process more efficient.
15
Evaluation Process
Framework Evaluation
●Two tagging scenarios:● Film reviews.● Texts describing military conflicts.
●Fuzzy ontologies for both domains, based on the manual fuzzification of a small portion of DBPedia and Freebase semantic data.
●Effectiveness was measured by determining the number of correctly tagged texts, namely texts whose highest ranked tags were the correct ones
Description
16
Evaluation Process
Framework Evaluation
●100 texts describing 20 distinct films that were similar to each other in terms of genre, actors and directors and thus more difficult to distinguish between them in a given review.
●Fuzzy Film Ontology:
● Concepts: Film, Actor, Director, Character
● Relations: wasAnImportantActorInFilm, isFamousForDirectingFilm, wasCharacterInFilm.
Film Reviews Scenario
●100 texts describing 100 miltary conflicts that were similar to each other in terms of participants and places
●Fuzzy Conflict Ontology:
● Concepts: Location, Military Conflict, Military Person
● Relations: tookPlaceNearLocation, wasAnImportantPartOfConflict, playedMajorRoleInConflict, isNearToLocation
Miltary Conflicts Scenario
17
Evaluation Results
Framework Evaluation
●We measured tagging effectiveness by determining the number of correctly tagged texts, namely texts whose highest ranked films were the correct ones.
●For comparison purposes, we performed the same process using a crisp version of the ontologies.
●Results:
18
Key Points
Conclusions and Future Work
●We proposed a novel framework that exploits fuzzy semantic information for automatically generating semantic tags for text documents
●This had two challenges:● Distinguishing correctly between the entities that play a central role to the
document’s meaning and those that are just complementary to it.● Inferring appropriate tags even when these are not explicitly mentioned within
the text.
●Our approach has been based on the customized utilization of fuzzy domain-specific ontological relations for extracting and evaluating tag “evidence” from within the text
●The added value that the consideration and exploitation of fuzziness brought to the tag recommendation task was experimentally verified through experiments in different domains
19
Framework Extensions
Conclusions and Future Work
●One important obstacle for the wider applicability of our approach is the bottleneck of acquiring (through development or reuse) the required fuzzy ontological information for the domain at hand.
●For that reason, our future work will focus on determining automated methods for fuzzifying crisp ontological facts:
● Data mining● Social network analysis● Crowdsourcing
20
Contact iSOCO
Thank you!
Questions?
Barcelona
Tel +34 935 677 200
Edificio Testa A
C/ Alcalde Barnils, 64-68
St. Cugat del Vallès
08174 Barcelona
Valencia
Tel +34 963 467 143
Oficina 107
C/ Prof. Beltrán Báguena, 4
46009 Valencia
Pamplona
Tel +34 948 102 408
Parque Tomás
Caballero, 2, 6º-4ª
31006 Pamplona
Dr. Panos AlexopoulosSenior Researcher
palexopoulos@isoco.com
Madrid
Tel +34 913 349 797
Av. del Partenón, 16-18, 1º7ª
Campo de las Naciones
28042 Madrid
top related