Antoine Isaac Information and networking days H2020 / Connecting Europe Facility, Jan 15-16, 2014
Jan 19, 2015
Antoine Isaac
Information and networking days
H2020 / Connecting Europe Facility, Jan 15-16, 2014
Europe’s platform to access cultural heritage
Currently30M objects
Built on descriptive metadatafrom a broad, heterogeneous network
Audiovisual collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
Musées Lausannois
Culture.frThe European Library
APEX
European Film Gateway Europeana Fashion
2,300 galleries, museums, archives and libraries
Accessing items from 36 countries
top 16
Portal interface in 31 languagesMetadata in 33 languages
Serving Europe’s citizens
5M visits on Europeana.eu7M Facebook impressionsAPI use…
Content (digital objects on the site of the provider)
Metadata (descriptive object information)
Public DomainCreative Commons LicensesRights reservedOrphan work
Facilitating re-use on the legal side
CC
Facilitating re-use on the language side?
Our network needs automatic translation tools to address information needs all over Europe
Gathering/linking existing multilingual data
Related projects applying NLP tools
E.g., The PATHS project has developed techniques to enrich English and Spanish collections
1)Identification of key entities
2)Detection of (typed) similarities between objects, using metadata
3)“Background links” to external resources such as Wikipedia
4)Classification of object against a hierarchy of topic
Applying these techniques to other languages would require work
1)requires language-specific tools (PoS tagging, lemmatization)
2)is straightforward to apply to new languages
3)requires language-specific tools
4)depends on (3) and on translation of some topics
http://www.paths-project.eu/eng/Resources/Semantic-Enrichment-of-Cultural-Heritage-content-in-PATHS
Language challenges for Digital Libraries
Typical queries are very short
Average < 2 terms
Identification of query language is not easy, even manually
39% of queries may belong to several languages
Plenty of named entities
60% of queries are for persons & places
Not only is it hard for queries: the same issues apply to the descriptive metadata
Studies by Humboldt University on Europeana and The European Libraryhttp://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf
Language processing issues at the scale of Europe
Europeana’s vision and mission
We believe in making cultural heritage openly accessible in a digital way, to promote the exchange of ideas and information
We want to be a catalyst for change in the world of cultural heritage