Top Banner
Europeana Newspapers Project Workshop on Refinement and Quality Assessment University Library "Svetozar Marković“ Belgrade, June 13 th 2013 Hans-Jörg Lieder/ Ulrike Kölsch Project Coordinator Berlin State Library, Germany Belgrade/June 13th 2013/University
14
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ENP Belgrade Workshop Project Overview

Europeana Newspapers Project

Workshop on Refinement and Quality AssessmentUniversity Library "Svetozar Marković“

Belgrade, June 13th 2013

Hans-Jörg Lieder/ Ulrike Kölsch

Project Coordinator

Berlin State Library, Germany

Belgrade/June 13th 2013/University Library

Page 2: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2

Content

Project Profile

• Consortium & Stakeholders

• Aims and Objectives

• Adding value

• Where do we go from here?

Page 3: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 3

Consortium & Stakeholders

• 18 partners from 12 countries within the consortium

National and University libraries

Universities

SME

• External partners and stakeholders

Involvement of libraries outside the project consortium via associated and network partnerships

• Framework

Funded as a Best Practice Network in the ICT PSP program of the European Commission

Project duration: February 2012 – January 2015

Page 4: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Consortium Partners

10. CCS Content Conversion

Specialists GmbH11. Stichting LIBER, Netherlands12. National Library of Latvia13. National Library of Turkey14. University Library of Belgrade15. University of Innsbruck16. State Library Dr. Friedrich Tessmann, Italy17. The British Library, UK

18. Europeana Foundation,

Netherlands

01. State Library Berlin, Germany02. National Library of the Netherlands03. National Library of Estonia04. National Library of Austria05. National Library of Finland06. State and University Library Hamburg, Germany07. National Library of France08. National Library of Poland09. University of Salford

Page 5: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Europeana Newspapers Consortium

NLF

SBB ONB

NLP

BnF

NLE

SUB HH

USAL

NLLLIBER,KB, EF

CCS

NLTUB

UIBK

LFT

BL

Page 6: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Associated Partners

1. National Library of Czech Republic 2. National Library of Wales3. National and University Library Ljubljana, Slovenia 4. National Library of Portugal5. National and University Library of Iceland 6. National Library of Spain7. National and University Library Zagreb, Croatia 8. National Library of Belgium9. St. Cyril and Methodius National Library, Bulgaria 10.National Library of Luxembourg11.Lucian Blaga Central University Library, Romania

Since April 2013 the project has eleven Associated partners and started intensive networking with further libraries

Page 7: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 7

Europeana Newspapers: Aims and Objectives

• Refinement methods for OCR, OLR (article segmentation), Named Entity Recognition (NER) and class recognition

Creation of 18 million pages of digitised newspapers - 10 million refined pages: OCR (UIBK, Austria)- 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany) Delivery of 8 million pages already available locally

• Quality evaluation and prediction tools

• Aggregation and refinement of newspapers for The European Library and Europeana

• Metadata: best practice recommendation for Creation of OCR-ready images Full-texts and associated metadata NER

• Dissemination: Further libraries are encouraged and supported in contributing newspapers content to Europeana

Page 8: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Value: Europeana Newspapers spreads best practice

Europeana Newspapers supports the creation of a larger window

into European culture by:

• Developing best practice for the digitisation of newspapers

• Sharing best practice and experiences through workshop with project partners,

associated partners, and networking partners

• Publishing best practice on our website

• National Information days

Page 9: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Added Value: Aggregation

Activities focused on three key messages: 1. The project and its outcomes (e.g. online access to a

collection of high-quality digitised newspapers);

2. The technological challenges (e.g. techniques for refining content and the development of a standardised metadata model);

3. The content-related issues (e.g. improving the extent of newspaper digitisation, the changing nature of historical research).

The European Library

• A single library domain aggregator

• Content from major European libraries

• Dedicated newspaper content browser

• Full-text search capabilities

• Portal for researchers

Page 10: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 10

Added Value: Scenarios

• Keyword and Phrase Search

• Image Browsing

• Access via content structure (OLR and NER results)

• Geo-location based service

• Text mining

• Crowd sourced correction and enrichment

• Access through mobile apps

• ...

Page 11: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Where are we now?

• OCR-Processing completed almost four million newspaper pages

• Available specification of use scenarios

• Available initial versions of evaluation tools

• Europeana Newspapers survey report

• Development of three tools to support highly standardised data

creation, data controlling and data delivery within the project

• Metadata recommendations ready to be published in October 2013

• Specifications for content browser

• CCS has started work (OLR)

• Dissemination and Information

- Established associated and networking partnerships

Page 12: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp

Where do we go from here

Activities focused on three key messages: 1. The project and its outcomes (e.g. online access to a

collection of high-quality digitised newspapers);

2. The technological challenges (e.g. techniques for refining content and the development of a standardised metadata model);

3. The content-related issues (e.g. improving the extent of newspaper digitisation, the changing nature of historical research).

More newspaper content• Most libraries have digitised less than 10% of their physical

newspaper collection

More recent content• 20th century content unavailable or only available under licence at

national level: need to work with publishers and rights holders

Exploit richness of European digitised newspaper collections• OCR not applied across the board and often selectively

Improved accessiblity• Richness of content has knock on effect on accessibility (e.g. full

text search)

Page 13: ENP Belgrade Workshop Project Overview

This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 13

Why newspapers? …and how, anyway?

"Die Zeitungen sind die Sekundenzeiger der Geschichte.“(Newspapers are the second hands of history) (This hand however, is not only of inferior metal to the other hands, it also

seldom works properly.) Arthur Schopenhauer

Relevant to all customers/citizens Relevant to regional and European policies incl. Europeana Newspaper holdings in public institutions are…

• … sometimes: solid and complete, beautiful bound; excellent microfilm copies• … frequently: frail and crumbly, missing editions, incomplete supplements, poorly

bound; poor microfilm copies, legal uncertainties with contemporary material

Page 14: ENP Belgrade Workshop Project Overview

Thank you for your attention!

Contact:

[email protected]

[email protected]

For more information, please see www.europeana-newspapers.eu

or follow our project news via Twitter (@eurnews) and

Facebook (https://www.facebook.com/EuropeanaNewspapers)