Jun 14, 2015
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European
Library
Nuno FreireChief data officerThe European Library
Pacific Neighbourhood Consortium 2014 Annual ConferenceTaipei, October 2014
Outline Introduction and Context
• The European Library
• Europeana
• The data model for metadata exchange in the Europeana network
Linked Data at The European Library• Managing and linking person names
• Managing and linking place names
• Managing and linking concepts
Introduction and context
www.theeuropeanlibrary.org
What is The European Library?
Project started 1996, full operational service from 2005
European hub of metadata, collections and increasing amount of full text
Membership of national and research libraries of 47 Council of Europe states
Non-profit, owned and managed by member libraries
What does The European Library offer?
Experienced European project partner
Large-scale aggregation
Infrastructure
Data and digital contentof Europe’s libraries
Data distribution
Data enrichment Linked open
data
Open data distribution
32.6m records from 2,300 European galleries, museums, archives and libraries
Books, newspapers, journals, letters, diaries, archival papers
Paintings, maps, drawings, photographs
Music, spoken word, radio broadcasts
Film, newsreels, television
Curated exhibitions
31 languages
EUROPEANA - Europe’s cultural heritage portal
The European Library as libraries aggregator to Europeana
Domain Aggregators National initiatives
Audiovisual collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
e.g. Musées Lausannois
e.g. Culture Grid,
Culture.fr
e.g. The European Library
e.g. APEX
e.g. EUScreen, European Film Gateway
e.g. Judaica Europeana, Europeana Fashion
Metadata in the Europeana Context
Provides a portal for users to access that data
• Metadata, previews and links to source
Makes the metadata freely available for anyone to re-use
• Under Creative Commons Zero (CC0) -public domain dedication
Makes metadata available via an API
Makes metadata available as Linked Open Data
• http://data.europeana.eu/
Europeana Data Model: a Collaborative Effort
Cross-community development
Involving library, archive and museum experts
Ca. 60 participants
http://pro.europeana.eu/edm-documentation
Europeana Data Model: general principles
• A cross domain approach
• Supporting the common semantics of cultural domains
• Addressing the requirements of the Europeana portal
• Adheres to the modeling principles of the Web of Data
• Available as an OWL ontology and XML schema
• Allows finer-grained models of the different domains to be at least partly interoperable at the semantic level
• Allows metadata to retain their original expressivity and richness
Linked Data at The European Library
Managing and linking person names
Which data from VIAF is used at The European Library
Name variantsVarious forms of the name of the person or organization. May include the complete name, abbreviated names, acronyms, etc.
Date of birth/deathThe dates of birth and death of the person
NationalitiesThe nationalities of a person or organization.
How data from VIAF is used in The European Library
Name variants• For matching of names across records and data sources• Improves the identification of all publications of a work, the
identification of publications in books-in-print databases, and the identification of the contributor in the rights-holders databases.
Date of birth/death• Used for determining the public domain status. • Used for matching confirmation and disambiguation of
homonyms across data sources
Nationalities• Used, in some countries, for determining the public domain
status of the work.
The matching process
VIAF data used for matching, disambiguation, and match probability
Matching work contributors with VIAF
Names are matched by similarity Confirmation of the correctness of a name
match is taken from other matching data• The dates of birth and death • The title of the work is compared against the list
of titles available in VIAF • All the contributors of the work are matched
against the list of known co-authors in VIAF• The publisher(s) of the work are matched against
the list of known publishers in VIAF A match is only chosen if enough supporting
evidence is found
Contributor names in statements of responsibility
“French Canadian freely arranged by Katherine K. Davis”.
“ed. by Peter Noever ; with a forew. by Frank O. Gehry; and contrib. by Coop Himmelblau.”
“W. Lange, A.C. Zeven and N.G. Hogenboom, editors”
“by Pamela and Neal Priestland”
“Vicente Aleixandre ; estudio previo, selección y notas de Leopoldo de Luis”
The approach
To approach the problem as a Named Entity Recognition task in text that may not be grammatically correct, thus lacking lexical evidence
Some requirements from the ARROW context• Easily applicable to several languages• The outcomes of the recognition task must be explainable
Design decisions• Exploring the structured data within national bibliographies
• By analysis of the frequency of word occurrences in names of persons, and in other textual data
• Using word occurrence frequency allows to • bypass the need for building training sets• be able to provide simpler explanations of the name recognition
results
The process – bibliographic record processing
The named entity recognition is performed for a record as follows:• Statement of responsibility is tokenized• The person names are recognized by comparing the
tokens with the dictionaries• The recognized names are compared against the
names of the contributors present in the structured fields of the record.
• If no similar name exists in the record, the contributor is added to the record in a structured data field
Evaluation data set(size of bibliographies and evaluation samples)
National BibliographyTotal
recordsMain
language
Evaluation sample
Statements of responsibility
ReferredPersons
British Library 13.4 million English 205 328German National Library
9.4 million German 200 378
National Library of the Netherlands
3.2 million Dutch 200 335
National Library of Greece 0.4 million Greek 297 379
Central Institute for the Union Catalogue of Italian Libraries
12.4 million Italian 224 297
Royal Library of Belgium 1 million
French and Dutch
203 387
Total: 1329 2104
Evaluation results
Dataset
Exact match metric
Partial match metric
Precision Recall Precision Recall
British Library 0.981 0.979 0.991 0.991German National Library 0.975 0.934 0.992 0.992
National Library of the Netherlands
0.973 0.875 0.977 0.979
National Library of Greece 0.656 0.414 0.758 0.868
Central Institute for the Union Catalogue of Italian Libraries
0.97 0.896 0.971 0.973
Royal Library of Belgium 0.981 0.959 0.981 0.982
Overall: 0.948 0.837 0.958 0.963
Linked Data at The European Library
Managing and linking place names
The approach for place name linking
• We process the complete metadata elements• The alignment is performed with Geonames
• Using the RDF dump of Geonames
• A generic approach not using any language specific information• The words themselves are not used as evidence
• We use only characteristics of the words (capitalization, size, etc)
• Wordnets, part-of-speech analysis, morphological analysis, etc., are not used.
• … in order to allow the use of this approach in a language independent manner
Resolution of the place names
• This task aims to find a single entity in the geographic ontology for aligning with the place name
• The first step of this task is to find all possible candidates for the resolution in the geographic ontology
• Uses a heuristic based predictive model:• Assigns a probability for each resolution candidate as match
or non_match
• An alignment is established if a minimum probability threshold for the class match is achieved.
Feature DescriptionNumber of words The number of words in the place name.
Name match If the recognized place name matched: the main name of the place, an alternate name, etc.
Exact name match
If the recognized place name matched exactly the place name.
Relative population
Relative population of the candidate in comparison with other candidates.
Geographic feature type
The type of geographic feature: continent, country, city, etc.
Related places found
The number of other place names found in the administrative hierarchy.
Relative related places
The relative number of administrative divisions found in the subject heading
In source country If it is located in one of the source countries of the subject heading system.
Which information supports the place name resolution
Linked Data at The European Library
Managing and linking concepts
Linking Subject Indexing and Classification Data The context
• The centralization of bibliographic metadata enables resource access under a unified knowledge organization system
The challenges• Diversity of languages • Diversity of knowledge organization systems in use across
European libraries• Heterogeneous levels of details in subject information
Current status at The European Library• Use of alignments between ontologies:• Alignments were created manually or semi-automatically• Alignments in use include: CERIF, MACS (LCSH,
RAMEAU, SWD), UDC and DDC
ReferencesFurther details may be consulted in the following publications:
•Freire, N, 2014, 'Word Occurrence Based Extraction of Work Contributors from Statements of Responsibility'. International Journal on Digital Libraries: Volume 14, Issue 3 (2014), Page 141-148. DOI: 10.1007/s00799-014-0113-3.•Charles, V., Freire, N, Antoine, I., 2014, 'Links, languages and semantics: linked data approaches in The European Library and Europeana', in 'Linked Data in Libraries: Let's make it happen!' IFLA 2014 Satellite Meeting on Linked Data in Libraries.•Freire, N, Muhr, M, 2013, 'Use of Authorities Open Data in the ARROW Rights Infrastructure' in proceeding of the DC-2013 Linking to the Future Conference, 2013.•Freire, N, 2013, 'Visualization and navigation of knowledge in pan-European resources: the case of The European Library' in proceedings of International UDC Seminar on Classification & Visualization: interfaces to knowledge.•N. Freire, et al., "Author Consolidation across European National Bibliographies and Academic Digital Repositories", 11th International Conference on Current Research Information Systems, 2012.•N. Freire, J. Borbinha, P. Calado, "A Language Independent Approach for Aligning Subject Heading Systems with Geographic Ontologies", International Conference on Dublin Core and Metadata Applications 2011, 2011.•N. Freire, J. Borbinha, P. Calado, B. Martins, "A Metadata Geoparsing System for Place Name Recognition and Resolution in Metadata Records", ACM/IEEE Joint Conference on Digital Libraries, 2011.