Top Banner
Kow Weng Onn, Michelle Lim Sien Niu, Dickson Lukose (MIMOS) Gudrun Johannsen, Johannes Keizer (UN FAO) Framework for Matching and Linking Large Ontologies
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Framework for Matching and Linking Large Ontologies

Kow Weng Onn, Michelle Lim Sien Niu,

Dickson Lukose (MIMOS)

Gudrun Johannsen, Johannes Keizer

(UN FAO)

Framework for Matching and

Linking Large Ontologies

Page 2: Framework for Matching and Linking Large Ontologies

Outline

• Introduction

• Objective

• Previous Work

• Proposed Framework

• Initial Experimental Results

• Future Work

• Q&A

2

Page 3: Framework for Matching and Linking Large Ontologies

Linked Open Data (2011)

3

Page 4: Framework for Matching and Linking Large Ontologies

AGROVOC Thesaurus

4

• Multilingual agricultural thesaurus • More than 40,000 concepts in up to 22 languages • Standard for document indexing • Information exchange and retrieval

Page 5: Framework for Matching and Linking Large Ontologies

Agriculture Linked Open Data (as of June 2012)

5

Vocabulary Domain Language Out-links (from AGROVOC)

EuroVoc General EU EN, ES, DE, FR, etc. (24 languages)

1,297

GEMET Environment EN, ES, DE, FR, etc. (29 languages)

1,191

LCSH General EN 1,093

NALT Agriculture EN, ES 13,390

STW Economy EN, DE 1,136

TheSoz Social Science EN, DE 846

RAMEAU General FR 686

DBpedia General EN, ES, DE, FR, etc. (97 languages)

993

DDC General EN, ES, DE, FR, etc. (12 languages)

409

Geopolitical Ontology

Geopolitical AR, ZH, FR, EN, ES, RU, IT 253

SWD General DE 5,965

GeoNames Geographical Database

67 languages 212

ASFA Thesaurus Aquatic Sciences

EN, FR, ES 1,812

FAO Biotechnology Glossary

Biotechnology AR, ZH, EN, FR, RU, ES, PL, SR, VI

791

Total 30,074

Page 6: Framework for Matching and Linking Large Ontologies

Why link AGROVOC Concepts?

• Allows access to document repositories

and other agricultural data

• Achieve interoperability of data in the

agricultural domain

• Allows linkage between same concepts in

different languages and different data sets

• Support knowledge harvesting tools

6

Page 7: Framework for Matching and Linking Large Ontologies

Current Approach

7

Morshed, A., Caracciolo, C., Johannsen, G., Kizer, J. (2011): Thesaurus alignment for Linked Data publishing. International Conference on Dublin Core and Metadata Applications 2011

Page 8: Framework for Matching and Linking Large Ontologies

Limitations of current approach

• Target ontology needs to be downloaded

into triple store; may not be the latest

version

• Full comparisons is time-consuming and

not scalable

• Manual evaluation by domain experts

required without tools support

• Multi-lingual terms not exploited

8

Page 9: Framework for Matching and Linking Large Ontologies

Semantic Mediation Tool

9

Page 10: Framework for Matching and Linking Large Ontologies

Semantic Mediation Tool GUI

10

Page 11: Framework for Matching and Linking Large Ontologies

Proposed Framework

11

Page 12: Framework for Matching and Linking Large Ontologies

Proposed Framework

• Index and match strategy (80-20

hypothesis)

• Source and Target accessed through

endpoints

• Automatic discovery of alignments

• Visualization and navigation tools to aid

decision-making

• Use multiple languages if available

12

Page 13: Framework for Matching and Linking Large Ontologies

Experimental Setup

• Source and target thesauri - AGROVOC

and the STW Thesaurus of Economics

• English preferred labels used

• Lucene version 3.5 used for indexing

• Thresholds used to limit results returned

• Precision and recall calculated based on

the existing 1136 links

13

Page 14: Framework for Matching and Linking Large Ontologies

Use Jaro-Winkler Algorithm

Initial Experiments

• Three separate experiments

1. Only index and match

2. Stemming before matching

3. String distance Jaro-Winkler algorithm

added to reduce misalignments

14

Rejected Matches

Accepted Matches

Threshold 1 Threshold 2

Page 15: Framework for Matching and Linking Large Ontologies

Proposed Framework

15

Page 16: Framework for Matching and Linking Large Ontologies

Example Experimental Output

16

Page 17: Framework for Matching and Linking Large Ontologies

Results

17

Mappings found

Correct Mappings

Precision Recall

Plain Labels 1587 1005 0.633 0.885

With Stemming included

1624 1025 0.631 0.902

With String distance

1389 1062 0.765 0.935

Results with primary threshold = 4.0, secondary threshold = 6.0

Page 18: Framework for Matching and Linking Large Ontologies

Results

18

Mappings found

Correct Mappings

Precision Recall

Plain Labels 1420 1006 0.708 0.886

With Stemming included

1445 1021 0.707 0.899

With String distance

1293 1037 0.802 0.913

Results with primary threshold = 5.0, secondary threshold = 6.0

Page 19: Framework for Matching and Linking Large Ontologies

Results

19

Mappings found

Correct Mappings

Precision Recall

Plain Labels 986 847 0.859 0.746

With Stemming included

1005 857 0.852 0.754

With String distance

996 855 0.858 0.753

Results with primary threshold = 6.0, secondary threshold = 7.0

Page 20: Framework for Matching and Linking Large Ontologies

Discussion

• Framework can work with good speed

– Indexing and matching takes less than 200 seconds in our experiments

• High recall with high threshold, but precision suffers and vice-versa

• As we’re proposing a semi-automatic system, a higher recall is preferred

• Hard to set threshold for Lucene score as it gives higher weight to less frequent words

20

Page 21: Framework for Matching and Linking Large Ontologies

Future Work

• Make use of multi-lingual aspect of

AGROVOC

• One-to-many and many-to-many matching

and linking by having a large index

• Modify Semantic Mediation Tool to be

front-end to interface to the framework

• Experiment with unlinked ontologies; but

how to evaluate?

21

Page 22: Framework for Matching and Linking Large Ontologies

22