The Use of Machine-Generated Ontologies in Dynamic Information Seeking

CoopIS’2001Trento, Italy

Giovanni ModicaAvigdor Gal

Hasan M. Jamil

Motivating example

PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that

fA(OA)=O

Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA (OB)=OA

In a perfect world:– O is known.

– fA is known.

– fB is known.

OA= fA-1(fB(OB))

Alas:– O is unknown. At best, an approximation of O exists, in a

form of a standard.

– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.

Proposed solution

Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA depends on the ontology representation.

A matching is associated with a “degree of confidence” in the matching.

0 identifies non-matching terms.1 identifies a crisp matching.

]1,0[: BABA OOf

Ontology representation

Dynamic information seeking:– HTML forms

• Labels• Input fields• Scripts

– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…

31}).• Scripts, among other things, suggest a precedence relationship (e.g.,

Pick-up Locations is required before selecting a Car Type).

Ontology representation

Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence

Ontology extraction and matchingURL (e.g. http://www.avis.com)

HTMLParsing

DOMTree

Phase 1Parsing

Phase 2Labeling

HTML Elements

Label Identification

FORM Elements

Form Renderin

Phase 3Ontology

Phase 4Merging

KB Submission

Matching Algorithms

Target/Candidate Ontology

Target Ontology

CandidateOntology

Refined Ontology

Ontology Creation

Thesaurus

Phase 1: Parsing

Phase 2: Labeling

MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,

1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:

Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,

… etc. Substring matching: Pickup Location Code Pick-up location

(66%) Content matching:

Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return

Thesaurus matching: Dropoff Location Return Location (100%)

Phase 4: Merging

Preliminary Results

Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):

Recall (completeness) Precision (soundness)

Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched

tP Recall: Precision:

Preliminary Results

2 )1(1

Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%

(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):

Preliminary Results

Precision vs. Recall (Avis & Hertz)

Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Recall

Precision

Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472

Preliminary Results

E Metric for Hertz vs. Alamo (b=0.5)

Textual Ign.Chars.

De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Preliminary Results

Learning from Thesaurus

0.389534884

0.479166667

No Thesaurus Improved Thesarus

E (b=0.5)

Summary and Future Work

We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results

Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to

queries against the multiple candidate ontologies

The Use of Machine-Generated Ontologies in Dynamic Information Seeking

ontology o

ontology representation

ontology representation

number of terms

nonmatching terms

ontology merging frakes

shared underlying ontology

textual matching

Documents

Knowledge Representation and Ontologies - Part 1: Modeling.....

An Approach to Automatically Generated ... - Web...

Demystifying Ontologies

Matching Anatomy Ontologies -...

Ontologies GO Workshop 3-6 August 2010. Ontologies What are...

How Does User Generated Content Influence Consumers’ New.....

Black Ontologies

Ontologies andSemantic Technologies · Semantic...

Reference Ontologies, Application Ontologies, Terminology...

Biological Ontologies

Practical Ontologies

Building Ontologies with Basic Formal...

Ontologies Reasoning Components Agents Simulations...

Biomedical Ontologies

CoopIS2001 Trento, Italy The Use of Machine-Generated...

Ontologies et thésaurus en médecine -...