Top Banner
1 Lider Roadmapping Workshop Deutsche Nationalbibliothek – Software-supported Bibliographic Recording and Linked Data Mark Zöpfgen Leipzig, 02.09.2014
19

Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Jul 07, 2015

Download

Data & Analytics

mbruemmer

Mark Zöpfgen (German National Library) presented their library activities in content extraction and semantic web. They maintain the National Bibliography, which contains all national print and electronic publications since 1913. They produce an authority file (called GMD “Gemeinsame Normdatei”) with metadata. Activities in content extraction and semantic web comprise several projects. In these they build an ontology for generating the data and which enables a multilingual access to subjects in order to make the German National Library internationally available. Manual effort is also invested in providing high quality translations of the subject headings of the bibliographical records into English and French. So far, an Open Linked Data Service for spreading the data is available and downloadable in RDF format under creative commons zero license. The main goals of the German National Library comprise the following topics:

-constant improvement of the poor formal state of the bibliographical highly reliable data.
-building an integrated portal with search engine and linked data.
-integration of German bibliographical data into The European Library and finding standards for the provision in the linked data format.
-increase precision of multi-language term mappings under the assumption that there is rarely 1-1 matching.
-the motivation of external parties to work with RDF data and improve search possibilities.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

1

Lider Roadmapping Workshop

Deutsche Nationalbibliothek –Software-supported Bibliographic Recording and Linked Data

Mark Zöpfgen

Leipzig, 02.09.2014

Page 2: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Overview

- DNB – German National Library

- Activities in Content Extraction and Semantic Web

- MACS

- PETRUS

- Open Linked Data

- Motivation/Challenges

Lider-Roadmapping-Workshop | Leipzig | 02.09.20142

Page 3: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

DNB – the German National Library

- central archival library and national bibliographic center for the Federal Republic of

Germany

- collect, permanently archive, comprehensively document and record

bibliographically all German and German-language publications since 1913, foreign

publications about Germany, translations of German works (German National

Bibliography)

- produces (in collaboration with other institutions the Integrated Authority File

(GND, “Gemeinsame Normdatei”)

- makes them available to the public

- develops and maintains bibliographic rules and standards for Germany

- plays a significant role in the development of international library standards.

Inventory: ~ 27,8 bibliographical units; ~ 719000 online - publications (mainly pdf

and epub)

Lider-Roadmapping-Workshop | Leipzig | 02.09.20143

Page 4: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.20144

Location in Leipzig

Location in Frankfurt

Page 5: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Activities in the Context of Content extractionand Semantic Web:

Contentus

CrissCross

Culturegraph

Linked Data Service

MACS

Unidissen

VIAF

PETRUS

For more information see:

http://www.dnb.de/DE/Wir/Projekte/projekte_node.html

Lider-Roadmapping-Workshop | Leipzig | 02.09.20145

Page 6: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

MACS – Multilingual Access to Subjects I/IV

- Creation of a multilingual retrieval-vocabulary for research in bibliograpic databases.

- Links between Subject Headings of LCSH (Library of Congress Subject Headings),

RAMEAU (Répertoire d'autorité-matière encyclopédique et alphabétique unifié)

and GND (Gemeinsame Normdatei)

- In cooperation with SNB (Swiss National Library)

- Currently ~ 63000 Links wich have been imported to the GND-records

Use CasesMake data of DNB internationally available (search via LCSH/RAMEAU-subject headings)

Search in the Library of Congress /Bibliothèque de France with GND-subject headings

Possibility to overtake subject headings from bibliographical records (e.g. in case of

translations)

- Link: http://www.dnb.de/DE/Wir/Kooperation/MACS/macs.html

Lider-Roadmapping-Workshop | Leipzig | 02.09.20146

Page 7: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.20147

MACS – Multilingual Access to Subjects II/IV

– Maintenance: The links are created/updated using the LMI (Link Management

Interface). The LMI provides a web-interface, data is stored in a central

database.

– Data Export / Import: The links are exported via OAI-Interface. The import to

the CBS (Central Bibliographic Database) is currently done by script (manually

initiated)

– Planned:

Integration in the search-portal of TEL (The European Library)

Provision via linked data service (actually not integrated)

Regular update between LMI and CBS

Page 8: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.20148

MACS – Multilingual Access to Subjects III/IV

Page 9: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

9

MACS – Multilingual Access to Subjects IV/IV

Lider-Roadmapping-Workshop | Leipzig | 02.09.2014

Page 10: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Petrus – Software-supported BibliographicalRecording I/V

Why software supported?

Growing number of online publications (see graphic below).

The German National Library is looking to reduce its traditional indexing operations

in areas which are no longer feasible due to the continually growing number of

publications, or are no longer necessary because of technological developments.

Lider-Roadmapping-Workshop | Leipzig | 02.09.201410

13525 17651

29823

112766

0

20000

40000

60000

80000

100000

120000

2007 2008 2009 2010

Page 11: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.201411

Petrus – Software-supported BibliographicalRecording II/V

ClassificationBased upon the DNB-”Sachgruppen” ~ first two layers of the DDC

Statistical procedure, training corpus ~ 300.000 objects with known classes (full

text and tables of content). The objects are limited to 40.000 characters.

After stemming, the data model is generated. As classifier, SVM (scalable vector

machine) is used. After the creation of the model, a 3-fold validation is executed, in

order to verify the quality.

The model can be transferred to an “endpoint”, which is a stand-alone application.

The endpoint communicates via web service-interface.

In use since January 2012; currently ~ 400 objects/day

Page 12: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.201412

Petrus – Software-supported BibliographicalRecording III/V

Keywording

Linguistic text analysis: language recognition, identification of sentences, words,

phrases etc.

Term matching with a dictionary which is based on the integrated authority file

(72000 subject headings), Disambiguation

Term ranking (dependant on position and frequency)

The keywording process can eventually be transferred to an “endpoint” (according

to the classification modell)

~ 80 objects/day

Page 13: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.201413

(1) List of publications to beprocessed

(2) Metadata to be imported out ofthe biblographic database

(3) (Full-text) objects to be importedout of the repository

(4) Transfer via a webserviceinterface

(5) Return of results(6) Storage of the results in the

bibliographic data base

Petrus – Software-supported BibliographicalRecording IV/V

Page 14: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Petrus – Software-supported BibliographicalRecording V/V

Lider-Roadmapping-Workshop | Leipzig | 02.09.201414

Appearance in the biblio-graphic record

Return of the classification software

Page 15: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.201415

Open Linked Data I/III

- DNB provides high quality, mainly intellectually created data.

- Authority file (GND) and National Bibliography are available in rdf-format

- Data is published under the Creative Commons Zero-License

- Currently, the data can be accessed via the Portal (for single records) or

downloaded

http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login

- Target groups are research facilities and non-profit organisations as well as

commercial service suppliers (e.g. search engines, knowledge management

systems)

Page 16: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

– Bibliographic data is highly reliable , but has a poor formal quality

(free-text fields) - High efforts for conversion

– The data was converted using Metafacture, which had been developed by

culturegraph.org. (www.culturegraph.org)

Lider-Roadmapping-Workshop | Leipzig | 02.09.201416

Open Linked Data II/III

Page 17: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Lider-Roadmapping-Workshop | Leipzig | 02.09.201417

Open Linked Data III/III

Bibliographic record of „Winnetou“ leads to Karl May - detail

… leads to place of birth

Coordinates

Page 18: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Motivation of the DNB

- Motivate external parties to work with rdf-data, e.g. linking it with other

ontologies.

- Improve search: Access by themes, browsing, unveiling relations between cultural

entities.

Technical Challenges

- Improve the accessibilty (e.g. by services - MACS)

- Search: Integrate Portal (knowledge representation, user interaction) with search

engine and linked data

–Lider-Roadmapping-Workshop | Leipzig | 02.09.201418

Page 19: Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data

Questions?

Lider-Roadmapping-Workshop | Leipzig | 02.09.201419

Mark ZöpfgenDeutsche NationalbibliothekInformationstechnikAdickesallee 1D-60322 Frankfurt am MainTelefon: +49-69-1525-1705mailto: [email protected]://www.d-nb.de