Improving library services with semantic web technology in the realm of repositories

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

Improving Library Services with Semantic Web Technology- in the realm of Repository Systems Dr. Timo Borst

Head of IT DevelopmentGerman National Library for Economics /Leibniz-Information Centre EconomicsKiel/Hamburg, Germany

ICDK 201114th – 16th February, Gurgaon/India

Seite 2

Overview1. Current situation: Distributed (meta-)data management in library

applications

2. Popular approaches towards aggregation and homogeneity of metadata

3. Our approach: Integration and aggregation of authority values with Semantic Web technology

a) General ideab) Use case: Indexingc) Use case: Retrieving

4. “Lightweight” integration into existing repository systems and service providers

5. Conclusion

Seite 3

Current situation

• The rise of repository systems for academic publishing…

• …has led to a landscape of distributed systems, each of them holding its own metadata…

• …which is harvested and aggregated by service providers

Seite 4

Popular approaches towards aggregation and homogeneity of metadata

• Normalization in advance (before harvesting) requires

• a mandatory metadata scheme to be applied by the local repositories• a set of controlled vocabularies (e.g. for publication types)• an automatic validation of the harvested metadata

• Normalization afterwards (after harvesting) requires

• the definition of a minimum set of metadata fields• the definition of a basic intermediate metadata scheme for normalizing

the heterogeneous metadata records,• optionally data cleansing strategies like name disambiguation and

automatic indexing on the basis of thesauri

Both approaches are problematic and reveal ambiguities on the aggregation level !

Seite 5

Current situation

• …sounds easy and straight, but impliessevere problems esp. with regard toambiguity of• author names• subject headings

Seite 6

Current situation

„The major difficulty we have found is with DSpace’s handling of metadata. While we feel that the number of fields in Dublin Core isadequate for most if not all uses (DCMI Usage Board 2006), we aretroubled by the lack of authority control when completing its fields. Without some control over uniform titles, authors and subjectsaccessing the items in the future will very problematic.“

S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/)

„Neither the standards nor the software unterlyinginstitutional repositories anticipated performing namingauthority control on widely disparate metadata fromhighly unreliable sources.“

D. Salo (http://minds.wisconsin.edu/handle/1793/31735)

Seite 7

Our approach: Integration of authority values with Semantic Web technology

• General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.”

• Approach: Concept modelling based on Semantic Web / SKOS standards

Seite 8

Our approach: Integration of authority values with Semantic Web technology

Seite 9

Our approach: Integration of authority values with Semantic Web technology – Web serviceExample queries (for concepts):

http://zbw.eu/beta/stw-ws/suggest?query=finanzkr…delivers all terms beginning with “finanzkr”

http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&concept=http://zbw.eu/stw/descriptor/19664-4&lang=en…delivers all english synonyms of the german “Finanzkrise”

Seite 10

Use case: (Self-)Indexing• One of the most prominent use cases especially for librarians, but also

for scientists and active users not familiar with subject specific vocabularies

• Main goals:• Support the process of indexing in order to achieve a classification

of documents which is both coherent and flexible in the sense that it permits local idiosyncrasies related to authority terms

• Align different vocabularies in the sense that indexing in one vocabulary is automatically linked to another vocabulary

• Implementation: Extension of the submission interface of our repository by integrating the terminology web service as an autosuggest function

Seite 11

Use case: (Self-)Indexing

Submission form https://econstor.eu

Seite 12

Use case: Retrieving

• To be considered as the most important use case

• Often leading into the classical dilemma of precision and recall

• Main goal:• Support the process of retrieving, so users can find the relevant set of documents

• Implementation: Automatic expansion of the original query with synonyms, narrower and related terms

Seite 13


Expanded search for „financial crisis“ http://econstor.eu

Seite 14



Seite 15



Seite 16

Anwendungsfall_2: Suche

Seite 17

Anwendungsfall_2: Suche

Seite 18

“Lightweight” integration into existing repository systemsand service providers

Seite 19

“Lightweight” integration into existing repository systemsand service providers

Benefits• „Lightweight“ extension of legacy systems

• Strategy of „least intrusion“: No update or migration needed

• No changes to the core system, only some changes to the data model may be required:• Additional column for storing the URI of the authority key• Export resp. harvesting of the authority as a resource must be able

(->OAI-ORE)

• Other types of library applications suitable for these adaptations:• catalogues• portals (e.g. to generate publication lists from an identified author or

thematic issues) • Any collaborative system with annotation system

Seite 20

Zusammenfassung und Fazit

• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene idiosynkratische Datenbestände.

• Dies erschwert die Pflege, den Austausch, die Aggregation und die Homogenisierung der (Meta-)Daten für erweiterte Dienste.

• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-Infrastruktur können frühzeitig zur Homogenisierung der Metadaten beitragen (bei gleichzeitiger Lokalisierung).

• Wenn diese Webservices verbreitet entstehen und genutzt werden, besteht die Chance zu einer weitergehenden Vernetzung lokal gepflegter Metadaten bei gleichzeitiger Verbesserung der datenbasierten Services.

• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an Betreiber von Bibliotheksanwendungen, diese Webservices mit möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.

Seite 21

Dr. Timo BorstDeutsche Zentralbibliothek für Wirtschaftswissenschaften / Leibniz-Informationszentrum Wirtschaft (ZBW)

[email protected]

Vielen Dank!

Seite 22

Anwendungsfall_3: Erfassung von Autoren

•Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher der Ausnahmefall•Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) + BibliotheksnutzerInnen (?)•Vorgang: Eingabe von AutorInnen-Namen•Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von Normdaten zu verbessern, die durch Webservices bereit gestellt werden

Seite 23

Anwendungsfall_3: Erfassung von Autoren•Erfassungsmaske unter http://87.106.250.18/beta/econstor/

Seite 24

Bisherige Lösungsansätze zur Aggregierung & Homogenisierung

•Metadatensuche durch Aggregatoren• Parallele Abfrage entfernt-verteilter Systeme• Rückgabe und Aufbereitung des Suchergebnisses als

zusammengesetzte Trefferliste•Harvesting• Regelmäßiges Einsammeln von entfernt-verteilten

Metadaten• Homogenisierung ex ante oder ex post•Föderierte Suche

•…

Seite 25

Literatur•[1] http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values•[2] http://minds.wisconsin.edu/handle/1793/31735•[3] http://dsug09.ub.gu.se/index.php/dsug/dsug09/paper/view/22/3•[4] http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/•[5] http://code.google.com/p/dspace-agrisap/wiki/ThesaurusAddOn•[6] http://edoc.hu-berlin.de/conferences/dc-2008/subirats-imma-199/PDF/subirats.pdf•[7] http://www.jisc.ac.uk/media/documents/programmes/sharedservices/names-phase-one-final-report,.pdf•[8] http://idea.library.drexel.edu/bitstream/1860/3173/1/20070051011.pdf•[9] http://ptsefton.com/blog/2006/06/06/the_affiliation_issue_in_institutional_repository_software/•[10] http://library.ust.hk/info/nac/nac-technical.html•[11] http://www.seco.tkk.fi/publications/2009/kurki-hyvonen-onki-people-2009.pdf•[12] http://journals.sfu.ca/archivar/index.php/archivaria/article/download/11883/12836•[13] http://www.dini.de/fileadmin/workshops/oa-netzwerk-juni2009/vernetzungstage_2009_malitz.pdf

Improving library services with semantic web technology in the realm of repositories

Technology

important use case

indexing c use case

lightweight integration

terminology web service

authority terms

general idea b use case

disparate metadata

financial crisis http