This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Europeana Newspapers Project
Workshop on Refinement and Quality AssessmentUniversity Library "Svetozar Marković“
Belgrade, June 13th 2013
Hans-Jörg Lieder/ Ulrike Kölsch
Project Coordinator
Berlin State Library, Germany
Belgrade/June 13th 2013/University Library
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
Content
Project Profile
• Consortium & Stakeholders
• Aims and Objectives
• Adding value
• Where do we go from here?
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 3
Consortium & Stakeholders
• 18 partners from 12 countries within the consortium
National and University libraries
Universities
SME
• External partners and stakeholders
Involvement of libraries outside the project consortium via associated and network partnerships
• Framework
Funded as a Best Practice Network in the ICT PSP program of the European Commission
Project duration: February 2012 – January 2015
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Consortium Partners
10. CCS Content Conversion
Specialists GmbH11. Stichting LIBER, Netherlands12. National Library of Latvia13. National Library of Turkey14. University Library of Belgrade15. University of Innsbruck16. State Library Dr. Friedrich Tessmann, Italy17. The British Library, UK
18. Europeana Foundation,
Netherlands
01. State Library Berlin, Germany02. National Library of the Netherlands03. National Library of Estonia04. National Library of Austria05. National Library of Finland06. State and University Library Hamburg, Germany07. National Library of France08. National Library of Poland09. University of Salford
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Europeana Newspapers Consortium
NLF
SBB ONB
NLP
BnF
NLE
SUB HH
USAL
NLLLIBER,KB, EF
CCS
NLTUB
UIBK
LFT
BL
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Associated Partners
1. National Library of Czech Republic 2. National Library of Wales3. National and University Library Ljubljana, Slovenia 4. National Library of Portugal5. National and University Library of Iceland 6. National Library of Spain7. National and University Library Zagreb, Croatia 8. National Library of Belgium9. St. Cyril and Methodius National Library, Bulgaria 10.National Library of Luxembourg11.Lucian Blaga Central University Library, Romania
Since April 2013 the project has eleven Associated partners and started intensive networking with further libraries
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 7
Europeana Newspapers: Aims and Objectives
• Refinement methods for OCR, OLR (article segmentation), Named Entity Recognition (NER) and class recognition
Creation of 18 million pages of digitised newspapers - 10 million refined pages: OCR (UIBK, Austria)- 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany) Delivery of 8 million pages already available locally
• Quality evaluation and prediction tools
• Aggregation and refinement of newspapers for The European Library and Europeana
• Metadata: best practice recommendation for Creation of OCR-ready images Full-texts and associated metadata NER
• Dissemination: Further libraries are encouraged and supported in contributing newspapers content to Europeana
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Value: Europeana Newspapers spreads best practice
Europeana Newspapers supports the creation of a larger window
into European culture by:
• Developing best practice for the digitisation of newspapers
• Sharing best practice and experiences through workshop with project partners,
associated partners, and networking partners
• Publishing best practice on our website
• National Information days
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Added Value: Aggregation
Activities focused on three key messages: 1. The project and its outcomes (e.g. online access to a
collection of high-quality digitised newspapers);
2. The technological challenges (e.g. techniques for refining content and the development of a standardised metadata model);
3. The content-related issues (e.g. improving the extent of newspaper digitisation, the changing nature of historical research).
The European Library
• A single library domain aggregator
• Content from major European libraries
• Dedicated newspaper content browser
• Full-text search capabilities
• Portal for researchers
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 10
Added Value: Scenarios
• Keyword and Phrase Search
• Image Browsing
• Access via content structure (OLR and NER results)
• Geo-location based service
• Text mining
• Crowd sourced correction and enrichment
• Access through mobile apps
• ...
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Where are we now?
• OCR-Processing completed almost four million newspaper pages
• Available specification of use scenarios
• Available initial versions of evaluation tools
• Europeana Newspapers survey report
• Development of three tools to support highly standardised data
creation, data controlling and data delivery within the project
• Metadata recommendations ready to be published in October 2013
• Specifications for content browser
• CCS has started work (OLR)
• Dissemination and Information
- Established associated and networking partnerships
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Where do we go from here
Activities focused on three key messages: 1. The project and its outcomes (e.g. online access to a
collection of high-quality digitised newspapers);
2. The technological challenges (e.g. techniques for refining content and the development of a standardised metadata model);
3. The content-related issues (e.g. improving the extent of newspaper digitisation, the changing nature of historical research).
More newspaper content• Most libraries have digitised less than 10% of their physical
newspaper collection
More recent content• 20th century content unavailable or only available under licence at
national level: need to work with publishers and rights holders
Exploit richness of European digitised newspaper collections• OCR not applied across the board and often selectively
Improved accessiblity• Richness of content has knock on effect on accessibility (e.g. full
text search)
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 13
Why newspapers? …and how, anyway?
"Die Zeitungen sind die Sekundenzeiger der Geschichte.“(Newspapers are the second hands of history) (This hand however, is not only of inferior metal to the other hands, it also
seldom works properly.) Arthur Schopenhauer
Relevant to all customers/citizens Relevant to regional and European policies incl. Europeana Newspaper holdings in public institutions are…