Top Banner
Terminology for information retrieval: ef f ectiveness of cross-concordances Philipp Mayr & Vivien Petras GESIS Social Science Information Centre, Bonn, Germany Knowledge organization on the Web ISKO-IWA meeting Naples, Italy, 5 September 2008
23

Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

Mar 19, 2018

Download

Documents

lekhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1

Term ino log y fo r in fo rm ation retrieval:ef fectiveness o f cross-concord ances

Philip p M ayr & Vivien PetrasGESIS Socia l Science In fo rm ation Centre, Bonn , Germ any

Knowledge organization on the WebISKO-IWA meeting

Nap les, Ita ly, 5 Sep tem ber 2 0 0 8

Page 2: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

2

German Social Science Infrastructure Services

www.gesis.org

• Digital Library

• Data archive

• Consulting

• Surveys & studies

• Society observation

Page 3: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

3

German Social Science Infrastructure Services

Document types:

• Bibliographic• Full texts• Project data• Institutions• Web pages• Statistical data• Surveys• People

Disciplines:

• Sociology• Political Science• Education• Psychology• Economics• Business Administration

Page 4: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

4

Heterogeneous collections

• Many databases:– document types / formats– vocabularies

• controlled vocabularies:– internal consistency ⇧– intersystem compatibility ⇩ (semantic heterogeneity)

• Solution: translate cross-walks terminologymapping

Page 5: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

5

KoMoHe Project

• September 2004 – 2007• Goals:

– Models for searching heterogeneous collections

– Development, organization & management ofcross-walks between controlled vocabularies

Page 6: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

6

Terminology Mapping Initiatives

• OCLC Terminology Services– DDC, LCC, LCSH, Mesh

• MACS (Multilingual Access to Subjects)– LCSH – Rameau – SWD

• CARMEN– SWD, TheSoz, STW, …

• Criss-Cross– SWD – DDC

Page 7: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

7

Cross-concordances

= manually created, directed relations betweencontrolled terms of two knowledge organizationsystems (KOS)

KOS 1 Computer

KOS 2 Information

System

KOS 1

Database

KOS 2

Information System

KOS 1

100%KOS 2

50% mapped

KOS 3

40%

KOS 4

60% mapped

Page 8: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

8

Relations

• Equivalence

• Narrower Term

• Broader Term

• Related Term

• Null: no mapping

KOS< Thesaurus

Speciallibrary>Library

0 Virus

Computers +Security^ Hacker

Bibliothéque= Library

KOS 2RelationKOS 1

Page 9: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

9

Cross-concordances

• 25 Vocabularies in 60 cross-concordances– Thesauri (16)– Descriptor lists (4)– Classifications (3)– Subject heading lists (2)

• 380,000 mapped terms• 465,000 relations• 205,000 equivalence relations• 13 German, 8 English, 1 Russian, 3 multilingual

Page 10: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 0

Disciplines

Social

Sciences (10)

Gerontology

(1)

Universal (3)

Psychology

(1)

Pedagogics

(1)

Sports

science (2)

Economics

(2)

Political

science (3)

Medicine (1)Agricultural

science (1)

Information

science (1)

Page 11: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 1

Differences

• Vocabulary type:– Thesaurus – Thesaurus– Classification – Thesaurus– (Classification – Classification)– (Thesaurus – Descriptor list)

• Change of discipline

• Change of language

• Size

• Combination / compounds

Page 12: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 2

Identical

equivalence

21%

Narrower Term

9%

Broader Term

20%

Equivalence

(Synonym)

24%

Null Mapping

12%

Related Term

14%

Thesaurus – Thesaurus

Page 13: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 3

Classification – Thesaurus (JEL – STW)

Narrower

Term

75%

Related

Term

8% Broader

Term

11%

Equivalence

6%

Page 14: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 4

Information Retrieval Tests

GOAL: Facilitate search across different databases

Navigate without semantic borders!

• Translate search terms into other terminologies

• Increase diversity of documents

• Improve search experience without effort forsearcher

Page 15: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 5

Information Retrieval Tests

1. Do mappings improve subject search?

CT (start vocabulary) TT Destination database

Fam ily relations Fam ily AND socia l relations

2. Do mappings improve free-text search?

FT (start vocabulary) FT + TT Destination database

Fam ily relations Fam ily relations OR (Fam ily ANDsocia l relations)

Page 16: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 6

Information Retrieval Tests

• Thesaurus mapping only

• Only equivalence relations

• Real queries (~6 per tested cross-concordance)

• Databases: 80,000 – 16 mio. documents

• Test 1 (CT TT): 13 Cross-concordances

• Test 2 (FT FT+TT): 8 Cross-concordances

Page 17: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 7

Information Retrieval Tests - Results

• CT TT (Improvements in %)

+68%+136% Interdisciplinary+34%+39% Intradisciplinary

Precision= Accuracy

Recall= Hitrate

-24%+24% Interdisciplinary-12%+20% Intradisciplinary

Precision= Accuracy

Recall= Hitrate

• FT FT+TT (Improvements in %)

Page 18: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 8

Sowiport Portal: www.sowiport.de

Page 19: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

1 9

Sowiport Thesaurus

Page 20: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

2 0

Sowiport Search

Page 21: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

2 1

Conclusion

• Cross-concordances improve subject search withcontrolled terms & free-text search

• Only 24% relations utilized (equivalence)

• Potential:– Other relations– Natural language query terms CT translation

• More mappings which are not evaluated

• Sowiport: http://www.sowiport.de

Page 22: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

2 2

Publications

Mayr, Philipp; Petras, Vivien (2008): Cross-concordances: terminologymapping and its effectiveness for information retrieval. In: 74th IFLAWorld Library and Information Congress. Québec, Canada-http://www.ifla.org/IV/ifla74/papers/129-Mayr_Petras-en.pdf

Mayr, Philipp; Mutschke, Peter; Petras, Vivien (2008): Reducingsemantic complexity in distributed Digital Libraries: treatment of termvagueness and document re-ranking. In: Library Review. 57 (2008) 3.pp. 213-224. http://arxiv.org/abs/0712.2449

Mayr, Philipp; Petras, Vivien (2008 to appear): Building a terminologynetwork for search: the KoMoHe project. In: International Conferenceon Dublin Core and Metadata Applications.

Page 23: Terminology for information retrieval: effectiveness of cross- · PDF file · 2011-07-29Terminology for information retrieval: effectiveness of cross-concordances ... cross-walks

2 3

„Databases without semantic borders“

KoM oHe Pro ject

http ://www.g esis.o rg /en/research/in form ation _ techno log y/kom ohe.htm

E-m ail: [email protected] .p etras@ g esis.o rg