Top Banner
Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL Workshop – Harvesting Metadata: Practices and Challenges September 30 2009
37

Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Dec 28, 2015

Download

Documents

Morris Roberts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Europeana: Update on Metadata

Mapping and Normalisation, Content

Ingestion and Aggregation Activities

Robina Clayphan

Interoperability Manager, EDLF

ECDL Workshop – Harvesting Metadata: Practices and Challenges

September 30 2009

Page 2: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Introduction

• A look at the metadata schema we use and the elements that must be in a standard form

• The whole ingestion process

• Summary of the aspects of and approach to aggregation

Page 3: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Europeana

Europeana brings together and makes available digital content from:

•Four cultural heritage sectors• Museums, Archives, Libraries, Audio-visual archives

•Twenty-nine countries• EU plus Norway and Switzerland

•Twenty-six languages

•Four types of material• Image, sound, video, text

….need for a metadata lingua franca…

Page 4: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

ESE V3.2

Europeana Semantic Elements (ESE) V3.2 developed for the prototype

•A Dublin core-based application profile• Cross-domain schema for heterogeneous data• Not to capture the full semantics of provider’s data

•37 Dublin Core terms – used principally to describe the objects

•12 Europeana coined terms - used to support portal functionality

• Needed to have consistent data for the portal to work

Page 5: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

The Dublin Core elements

Title Alternative

Creator Subject Description TableOfContents

Publisher Contributor Date Created; Issued

Type Format Extent; Medium

Identifier Source Language Relation isVersionOf; hasVersion; isReplacedBy; replaces; isRequiredBy; requires;

isPartOf; hasPart; isReferencedBy; references; isFormatOf; hasFormat; conformsTo

Coverage Spatial; Temporal

Rights Provenance

Page 6: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Europeana elements

Element Who is responsible Function

europeana:isShownAt or europeana:isShownBy

Provider must provide at least one of these elements - both if applicable.URL

Links to object

europeana:object Provider - if appropriate to the data URL

Source of thumbnail

europeana:provider Provider must provide this element. Controlled list.

Facet

europeana:type Provider must provide this element.Controlled list

Facet

europeana:unstored Provider – only if appropriate to your data. Text string

Container element

europeana:country

Europeana is responsible for providing all these elements.

Facet

europeana:hasObject System use

europeana:language Facet

europeana:uri System Identifier

europeana:usertag User provided tags (future)

europeana:year Facet, timeline

Page 7: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.
Page 8: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.
Page 9: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.
Page 10: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Normalised elements

• Language• ISO 369-1 standard two character code.

• Country • ISO 3166 standard

• Year• Four digit year from Gregorian calendar (YYYY)• Generated where possible from date supplied in <dc:date>

• Provider• Controlled list of names, in the language of provider

• Type• Controlled list (in English) of four types: Text, Image, Sound, Video• mapped from the diverse types used in source data (by provider)

Page 11: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Mapping and Normalisation

Three key reference documents for providers:

•ESE Specification V3.2

•Normalisation Guidelines V1.2

•ESE V3.2 XML schema + explanatory text

All available from the “Provide Content” section of the Europeana Group pages:

http://group.europeana.eu/web/guest/provide_content

Page 12: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.
Page 13: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Content Ingestion

……starting right from the beginning

Page 14: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Global Europeanaingestion workflow

Page 15: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Activity diagram: Steps I5 to I8

Page 16: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Content Ingestion

• Europeana has provided a Content Checker tool which has two parts:

• The Content Ingestor• Allows uploading of a data set• Validation against the ESE V3.2 XML schema• Importing the data into the database• Indexing of data• Caching of thumbnails

• The Test Portal• Separate from the operational portal• Allows provider to search for uploaded data

Page 17: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Content Ingestor

Select “new data set” - the ingestor automatically creates a new ID – “null05” in this example

Page 18: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Content Ingestor - upload

Page 19: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Content Ingestor - validate

Page 20: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Import

Page 21: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Index

Page 22: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Cache

Page 23: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Test Portal - search

Page 24: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Aggregation and the Content Strategy

Move on to a look at various aspects of aggregation in Europeana – the need for it, the approach to it.

Page 25: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Aggregation - terminology

• A Content Provider • an organization that provides metadata that enables access to its

digital objects

• An Aggregator • collects metadata from a group of content providers• transmits them to Europeana,• helps content providers with guidance on conformance with

Europeana norms • transforms metadata if necessary• supports the content providers with administration, operations and

training

Page 26: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Roles and benefits

• Content providers • Know their content and data best – fewer mapping errors• Look at the results before ingested in operational system

• Aggregators • Know the needs of the providers (domain, level)• Play a bridging role between providers and Europeana – single

point of contact, conduit for information in both directions

• Europeana• Supporting role for consultation, co-ordination, standardisation• Management of the 10 million objects• Offer the cross-domain and multi-lingual service

Page 27: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Organisational Model

Europeana

AggregatorAggregator

InstituteInstituteInstitute

Aggregator

Institute Institute Institute Institute Institute Institute Institute

Institute Institute Institute Institute Institute Institute Institute InstituteInstituteInstitute Institute

Page 28: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Types of aggregator

Matrix of aggregators:

• cross-domain, single domain, thematic

• level of operation – regional, national, European, global

Domain/Geographic coverage Regional National European Worldwide

Cross-domain

(horizontal)

Thuis in Brabant CulturaItalia Europeana

Single- domain

(vertical)

MovE (museums in East Flanders )

Direcção-Geral de Arquivos (Portuguese archives)

Dismarc (music)

TEL (books)

EFG (movies)

World Digital library WorldCat

Them-

atic

Cross domain Judaica ArXiv.org

Single domain Great War Archive

Page 29: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Why aggregation?

• November 2008 – 5 million items in Europeana

• July 2009 - content from over 1000 providers

• July 2010 – target of 10 million items

• Many individual organisations asking to contribute

• Currently there are six projects that aggregate content for Europeana (amongst other objectives)

• another three projects starting later this year

• Europeana Group site at: http://group.europeana.eu/web/guest/home

Page 30: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.
Page 31: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Why aggregation?

• Labour-intensive administration and ingestion processes • Not due to the amount of data – but the number of organisations

• Aggregation provides economies of scale allowing Europeana Office to remain relatively small

Promoting aggregation and providing services and expertise to aggregators will be key to Europeana’s Content Strategy

• Europeana is a small organisation!

Page 32: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Aggregation activities

• Aggregators survey• Establish shared issues and need for support

• Formation of Aggregators group• Council of Content Providers and Aggregators is now part of

Europeana Governance structure

• Training for aggregators• Generic and bespoke training days as the need arises

• Identifying potential aggregators

• “EuropeanaLabs” for Aggregators

• Test environment for content delivery and/or software development

Page 33: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Aggregation activities

• Handbook for aggregators. Content to be decided as part of survey but likely to cover:

• Europeana source code, APIs, content checker etc• Technical documentation for participating in Europeana• Templates and documentation for budget planning, fundraising,

revenue generation, sustainability• Templates and documentation for administrative and

organisational aspects of running an aggregator• Templates and documentation on IPR and European Licensing

framework• Documentation for establishing political and networks support• Templates and documentation for dissemination activities• Wiki for aggregator issues

Page 34: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Thank you!

[email protected]

Page 35: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

Thank you!

[email protected]

Page 36: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

isShownBy1

Page 37: Europeana: Update on Metadata Mapping and Normalisation, Content Ingestion and Aggregation Activities Robina Clayphan Interoperability Manager, EDLF ECDL.

isShownAt2