Die ZBW ist Mitglied der Leibniz-Gemeinschaft Improving Library Services with Semantic Web Technology - in the realm of Repository Systems Dr. Timo Borst Head of IT Development German National Library for Economics / Leibniz-Information Centre Economics Kiel/Hamburg, Germany ICDK 2011 14th – 16th February, Gurgaon/India
25
Embed
Improving library services with semantic web technology in the realm of repositories
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Die ZBW ist Mitglied der Leibniz-Gemeinschaft
Improving Library Services with Semantic Web Technology- in the realm of Repository Systems Dr. Timo Borst
Head of IT DevelopmentGerman National Library for Economics /Leibniz-Information Centre EconomicsKiel/Hamburg, Germany
ICDK 201114th – 16th February, Gurgaon/India
Seite 2
Overview1. Current situation: Distributed (meta-)data management in library
applications
2. Popular approaches towards aggregation and homogeneity of metadata
3. Our approach: Integration and aggregation of authority values with Semantic Web technology
a) General ideab) Use case: Indexingc) Use case: Retrieving
4. “Lightweight” integration into existing repository systems and service providers
5. Conclusion
Seite 3
Current situation
• The rise of repository systems for academic publishing…
• …has led to a landscape of distributed systems, each of them holding its own metadata…
• …which is harvested and aggregated by service providers
Seite 4
Popular approaches towards aggregation and homogeneity of metadata
• Normalization in advance (before harvesting) requires
• a mandatory metadata scheme to be applied by the local repositories• a set of controlled vocabularies (e.g. for publication types)• an automatic validation of the harvested metadata
• the definition of a minimum set of metadata fields• the definition of a basic intermediate metadata scheme for normalizing
the heterogeneous metadata records,• optionally data cleansing strategies like name disambiguation and
automatic indexing on the basis of thesauri
Both approaches are problematic and reveal ambiguities on the aggregation level !
Seite 5
Current situation
• …sounds easy and straight, but impliessevere problems esp. with regard toambiguity of• author names• subject headings
Seite 6
Current situation
„The major difficulty we have found is with DSpace’s handling of metadata. While we feel that the number of fields in Dublin Core isadequate for most if not all uses (DCMI Usage Board 2006), we aretroubled by the lack of authority control when completing its fields. Without some control over uniform titles, authors and subjectsaccessing the items in the future will very problematic.“
S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/)
„Neither the standards nor the software unterlyinginstitutional repositories anticipated performing namingauthority control on widely disparate metadata fromhighly unreliable sources.“
D. Salo (http://minds.wisconsin.edu/handle/1793/31735)
Seite 7
Our approach: Integration of authority values with Semantic Web technology
• General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.”
• Approach: Concept modelling based on Semantic Web / SKOS standards
Seite 8
Our approach: Integration of authority values with Semantic Web technology
Seite 9
Our approach: Integration of authority values with Semantic Web technology – Web serviceExample queries (for concepts):
http://zbw.eu/beta/stw-ws/suggest?query=finanzkr…delivers all terms beginning with “finanzkr”
http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&concept=http://zbw.eu/stw/descriptor/19664-4&lang=en…delivers all english synonyms of the german “Finanzkrise”
Seite 10
Use case: (Self-)Indexing• One of the most prominent use cases especially for librarians, but also
for scientists and active users not familiar with subject specific vocabularies
• Main goals:• Support the process of indexing in order to achieve a classification
of documents which is both coherent and flexible in the sense that it permits local idiosyncrasies related to authority terms
• Align different vocabularies in the sense that indexing in one vocabulary is automatically linked to another vocabulary
• Implementation: Extension of the submission interface of our repository by integrating the terminology web service as an autosuggest function
Seite 11
Use case: (Self-)Indexing
Submission form https://econstor.eu
Seite 12
Use case: Retrieving
• To be considered as the most important use case
• Often leading into the classical dilemma of precision and recall
• Main goal:• Support the process of retrieving, so users can find the relevant set of documents
• Implementation: Automatic expansion of the original query with synonyms, narrower and related terms
Seite 13
Use case: Retrieving
Expanded search for „financial crisis“ http://econstor.eu
Seite 14
Use case: Retrieving
Expanded search for „financial crisis“ http://econstor.eu
Seite 15
Use case: Retrieving
Expanded search for „financial crisis“ http://econstor.eu
Seite 16
Anwendungsfall_2: Suche
Seite 17
Anwendungsfall_2: Suche
Seite 18
“Lightweight” integration into existing repository systemsand service providers
Seite 19
“Lightweight” integration into existing repository systemsand service providers
Benefits• „Lightweight“ extension of legacy systems
• Strategy of „least intrusion“: No update or migration needed
• No changes to the core system, only some changes to the data model may be required:• Additional column for storing the URI of the authority key• Export resp. harvesting of the authority as a resource must be able
(->OAI-ORE)
• Other types of library applications suitable for these adaptations:• catalogues• portals (e.g. to generate publication lists from an identified author or
thematic issues) • Any collaborative system with annotation system
Seite 20
Zusammenfassung und Fazit
• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene idiosynkratische Datenbestände.
• Dies erschwert die Pflege, den Austausch, die Aggregation und die Homogenisierung der (Meta-)Daten für erweiterte Dienste.
• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-Infrastruktur können frühzeitig zur Homogenisierung der Metadaten beitragen (bei gleichzeitiger Lokalisierung).
• Wenn diese Webservices verbreitet entstehen und genutzt werden, besteht die Chance zu einer weitergehenden Vernetzung lokal gepflegter Metadaten bei gleichzeitiger Verbesserung der datenbasierten Services.
• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an Betreiber von Bibliotheksanwendungen, diese Webservices mit möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.
Seite 21
Dr. Timo BorstDeutsche Zentralbibliothek für Wirtschaftswissenschaften / Leibniz-Informationszentrum Wirtschaft (ZBW)
•Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher der Ausnahmefall•Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) + BibliotheksnutzerInnen (?)•Vorgang: Eingabe von AutorInnen-Namen•Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von Normdaten zu verbessern, die durch Webservices bereit gestellt werden
Seite 23
Anwendungsfall_3: Erfassung von Autoren•Erfassungsmaske unter http://87.106.250.18/beta/econstor/
Seite 24
Bisherige Lösungsansätze zur Aggregierung & Homogenisierung
•Metadatensuche durch Aggregatoren• Parallele Abfrage entfernt-verteilter Systeme• Rückgabe und Aufbereitung des Suchergebnisses als
zusammengesetzte Trefferliste•Harvesting• Regelmäßiges Einsammeln von entfernt-verteilten
Metadaten• Homogenisierung ex ante oder ex post•Föderierte Suche