1 Lider Roadmapping Workshop Deutsche Nationalbibliothek – Software-supported Bibliographic Recording and Linked Data Mark Zöpfgen Leipzig, 02.09.2014
Jul 07, 2015
1
Lider Roadmapping Workshop
Deutsche Nationalbibliothek –Software-supported Bibliographic Recording and Linked Data
Mark Zöpfgen
Leipzig, 02.09.2014
Overview
- DNB – German National Library
- Activities in Content Extraction and Semantic Web
- MACS
- PETRUS
- Open Linked Data
- Motivation/Challenges
Lider-Roadmapping-Workshop | Leipzig | 02.09.20142
DNB – the German National Library
- central archival library and national bibliographic center for the Federal Republic of
Germany
- collect, permanently archive, comprehensively document and record
bibliographically all German and German-language publications since 1913, foreign
publications about Germany, translations of German works (German National
Bibliography)
- produces (in collaboration with other institutions the Integrated Authority File
(GND, “Gemeinsame Normdatei”)
- makes them available to the public
- develops and maintains bibliographic rules and standards for Germany
- plays a significant role in the development of international library standards.
Inventory: ~ 27,8 bibliographical units; ~ 719000 online - publications (mainly pdf
and epub)
Lider-Roadmapping-Workshop | Leipzig | 02.09.20143
Lider-Roadmapping-Workshop | Leipzig | 02.09.20144
Location in Leipzig
Location in Frankfurt
Activities in the Context of Content extractionand Semantic Web:
Contentus
CrissCross
Culturegraph
Linked Data Service
MACS
Unidissen
VIAF
PETRUS
…
For more information see:
http://www.dnb.de/DE/Wir/Projekte/projekte_node.html
Lider-Roadmapping-Workshop | Leipzig | 02.09.20145
MACS – Multilingual Access to Subjects I/IV
- Creation of a multilingual retrieval-vocabulary for research in bibliograpic databases.
- Links between Subject Headings of LCSH (Library of Congress Subject Headings),
RAMEAU (Répertoire d'autorité-matière encyclopédique et alphabétique unifié)
and GND (Gemeinsame Normdatei)
- In cooperation with SNB (Swiss National Library)
- Currently ~ 63000 Links wich have been imported to the GND-records
Use CasesMake data of DNB internationally available (search via LCSH/RAMEAU-subject headings)
Search in the Library of Congress /Bibliothèque de France with GND-subject headings
Possibility to overtake subject headings from bibliographical records (e.g. in case of
translations)
- Link: http://www.dnb.de/DE/Wir/Kooperation/MACS/macs.html
Lider-Roadmapping-Workshop | Leipzig | 02.09.20146
Lider-Roadmapping-Workshop | Leipzig | 02.09.20147
MACS – Multilingual Access to Subjects II/IV
– Maintenance: The links are created/updated using the LMI (Link Management
Interface). The LMI provides a web-interface, data is stored in a central
database.
– Data Export / Import: The links are exported via OAI-Interface. The import to
the CBS (Central Bibliographic Database) is currently done by script (manually
initiated)
– Planned:
Integration in the search-portal of TEL (The European Library)
Provision via linked data service (actually not integrated)
Regular update between LMI and CBS
Lider-Roadmapping-Workshop | Leipzig | 02.09.20148
MACS – Multilingual Access to Subjects III/IV
9
MACS – Multilingual Access to Subjects IV/IV
Lider-Roadmapping-Workshop | Leipzig | 02.09.2014
Petrus – Software-supported BibliographicalRecording I/V
Why software supported?
Growing number of online publications (see graphic below).
The German National Library is looking to reduce its traditional indexing operations
in areas which are no longer feasible due to the continually growing number of
publications, or are no longer necessary because of technological developments.
Lider-Roadmapping-Workshop | Leipzig | 02.09.201410
13525 17651
29823
112766
0
20000
40000
60000
80000
100000
120000
2007 2008 2009 2010
Lider-Roadmapping-Workshop | Leipzig | 02.09.201411
Petrus – Software-supported BibliographicalRecording II/V
ClassificationBased upon the DNB-”Sachgruppen” ~ first two layers of the DDC
Statistical procedure, training corpus ~ 300.000 objects with known classes (full
text and tables of content). The objects are limited to 40.000 characters.
After stemming, the data model is generated. As classifier, SVM (scalable vector
machine) is used. After the creation of the model, a 3-fold validation is executed, in
order to verify the quality.
The model can be transferred to an “endpoint”, which is a stand-alone application.
The endpoint communicates via web service-interface.
In use since January 2012; currently ~ 400 objects/day
Lider-Roadmapping-Workshop | Leipzig | 02.09.201412
Petrus – Software-supported BibliographicalRecording III/V
Keywording
Linguistic text analysis: language recognition, identification of sentences, words,
phrases etc.
Term matching with a dictionary which is based on the integrated authority file
(72000 subject headings), Disambiguation
Term ranking (dependant on position and frequency)
The keywording process can eventually be transferred to an “endpoint” (according
to the classification modell)
~ 80 objects/day
Lider-Roadmapping-Workshop | Leipzig | 02.09.201413
(1) List of publications to beprocessed
(2) Metadata to be imported out ofthe biblographic database
(3) (Full-text) objects to be importedout of the repository
(4) Transfer via a webserviceinterface
(5) Return of results(6) Storage of the results in the
bibliographic data base
Petrus – Software-supported BibliographicalRecording IV/V
Petrus – Software-supported BibliographicalRecording V/V
Lider-Roadmapping-Workshop | Leipzig | 02.09.201414
Appearance in the biblio-graphic record
Return of the classification software
Lider-Roadmapping-Workshop | Leipzig | 02.09.201415
Open Linked Data I/III
- DNB provides high quality, mainly intellectually created data.
- Authority file (GND) and National Bibliography are available in rdf-format
- Data is published under the Creative Commons Zero-License
- Currently, the data can be accessed via the Portal (for single records) or
downloaded
http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login
- Target groups are research facilities and non-profit organisations as well as
commercial service suppliers (e.g. search engines, knowledge management
systems)
– Bibliographic data is highly reliable , but has a poor formal quality
(free-text fields) - High efforts for conversion
– The data was converted using Metafacture, which had been developed by
culturegraph.org. (www.culturegraph.org)
Lider-Roadmapping-Workshop | Leipzig | 02.09.201416
Open Linked Data II/III
Lider-Roadmapping-Workshop | Leipzig | 02.09.201417
Open Linked Data III/III
Bibliographic record of „Winnetou“ leads to Karl May - detail
… leads to place of birth
Coordinates
Motivation of the DNB
- Motivate external parties to work with rdf-data, e.g. linking it with other
ontologies.
- Improve search: Access by themes, browsing, unveiling relations between cultural
entities.
Technical Challenges
- Improve the accessibilty (e.g. by services - MACS)
- Search: Integrate Portal (knowledge representation, user interaction) with search
engine and linked data
–Lider-Roadmapping-Workshop | Leipzig | 02.09.201418
Questions?
Lider-Roadmapping-Workshop | Leipzig | 02.09.201419
Mark ZöpfgenDeutsche NationalbibliothekInformationstechnikAdickesallee 1D-60322 Frankfurt am MainTelefon: +49-69-1525-1705mailto: [email protected]://www.d-nb.de