The Use of Machine-Generated Ontologies in Dynamic Information Seeking

Post on 07-Jan-2016

31 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Use of Machine-Generated Ontologies in Dynamic Information Seeking. Giovanni Modica Avigdor Gal Hasan M. Jamil. Motivating example. Preliminaries. Definition : An ontology is an explicit representation of a conceptualization. (Gruber 1993) - PowerPoint PPT Presentation

Transcript

CoopIS’2001Trento, Italy

The Use of Machine-Generated Ontologies in Dynamic Information Seeking

Giovanni ModicaAvigdor Gal

Hasan M. Jamil

CoopIS’2001Trento, Italy

Motivating example

CoopIS’2001Trento, Italy

PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that

fA(OA)=O

CoopIS’2001Trento, Italy

Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA (OB)=OA

In a perfect world:– O is known.

– fA is known.

– fB is known.

OA= fA-1(fB(OB))

Alas:– O is unknown. At best, an approximation of O exists, in a

form of a standard.

– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.

CoopIS’2001Trento, Italy

Proposed solution

Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA depends on the ontology representation.

A matching is associated with a “degree of confidence” in the matching.

0 identifies non-matching terms.1 identifies a crisp matching.

]1,0[: BABA OOf

CoopIS’2001Trento, Italy

Ontology representation

Dynamic information seeking:– HTML forms

• Labels• Input fields• Scripts

– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…

31}).• Scripts, among other things, suggest a precedence relationship (e.g.,

Pick-up Locations is required before selecting a Car Type).

CoopIS’2001Trento, Italy

Ontology representation

Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence

CoopIS’2001Trento, Italy

Ontology extraction and matchingURL (e.g. http://www.avis.com)

HTMLParsing

DOMTree

Phase 1Parsing

Phase 2Labeling

HTML Elements

Label Identification

FORM Elements

rules

Form Renderin

g

Phase 3Ontology

Phase 4Merging

KB

KB Submission

Matching Algorithms

Target/Candidate Ontology

Target Ontology

CandidateOntology

Refined Ontology

Ontology Creation

Thesaurus

CoopIS’2001Trento, Italy

Phase 1: Parsing

CoopIS’2001Trento, Italy

Phase 2: Labeling

CoopIS’2001Trento, Italy

Phase 2: Labeling

CoopIS’2001Trento, Italy

Phase 2: Labeling

CoopIS’2001Trento, Italy

MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,

1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:

Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,

… etc. Substring matching: Pickup Location Code Pick-up location

(66%) Content matching:

Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return

Thesaurus matching: Dropoff Location Return Location (100%)

CoopIS’2001Trento, Italy

Phase 4: Merging

CoopIS’2001Trento, Italy

Preliminary Results

Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):

Recall (completeness) Precision (soundness)

Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched

r

m

t

tR

m

e

t

tP Recall: Precision:

CoopIS’2001Trento, Italy

Preliminary Results

RPb

PRbE

2

2 )1(1

Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%

(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):

CoopIS’2001Trento, Italy

Preliminary Results

Precision vs. Recall (Avis & Hertz)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Recall

Precision

Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472

CoopIS’2001Trento, Italy

Preliminary Results

E Metric for Hertz vs. Alamo (b=0.5)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual Ign.Chars.

De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Hertz

Alamo

CoopIS’2001Trento, Italy

Preliminary Results

Learning from Thesaurus

0.389534884

0.479166667

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

No Thesaurus Improved Thesarus

E (b=0.5)

CoopIS’2001Trento, Italy

Summary and Future Work

We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results

Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to

queries against the multiple candidate ontologies

top related