May 09, 2015
● 2005: a not-for-profit organisation founded by S.M.A.K., Argos, M HKA and MDD and focussed on the preservation of audiovisual arts
● 2011: ‘Centre of Expertise in Digital Heritage’: development and dissemination of knowledge on and expertise in the digitisation of cultural heritage
● Focus on all aspects of the digitisation process: creation, cataloguing, storage, sharing, exchange and reuse of digital heritage content
● Flemish projects: CEST, Scoremodel Digital Sustainability, TRACKS, VIAA, opencultuurdata.be, Persistent Identification
● European projects: ATHENA, ATHENAplus, PREFORMA, Europeana Space, DCA, Linked Heritage
● Erfgoedstats | SODA | SIP creator ● packed.be | scart.be | projectcest.be | scoremodel.org | projecttracks.org
PERSISTENT IDENTIFICATION:
MAKING DIGITAL
HUMANITIES RESEARCH EASIER
Alina Saenko | PACKED vzw 13.06.2014
CONTENTS
§ The problem: How to lead digital humanities researchers to rich and trustworthy museum data?
§ The solution: Persistent identification of data as a technical requirement, making DH research possible
§ The project: Museum sector
4
THE PROBLEM
RESEARCHERS LOOK FOR - trustworthy data from a source they can refer to, but…
6
RESEARCHERS LOOK FOR - rich machine-readable data, which they can use for
digital analysis and research, but…
7
MUSEUMS STRUGGLE
- how to keep online available data up-to-date? - how to keep online available data complete/
comprehensive? - how to publish the rich data in a machine-readable
format? - how to get online available data as high as possible in
the ranking of search engines?
8
IS THERE A SOLUTION? Complex... But one important technical requirement: § Museums should gain control over the web address which points
to the information about a work of art in their collection.
§ Museum should aim for keeping this web address as persistent as possible in order to become a trustworthy source of information and to increase the internet traffic to the web address.
§ Museums should consider themselves as online publishers of their own collection data.
How does that work in practice?
9
THE SOLUTION:
PERSISTENT URIS
WHAT IS AN URI? 1. make things and data accessible via the web
11
http://www.smak.be/collectie_kunstenaar.php?
la=en&id=&i=0&t=&tid=&y=&l=b&kunstenaar_id=1608&kunstwerk
_id=1490
http://www.smak.be/collectie_afbeeldingen/billing.jpg
WHAT IS AN URI? 2. uniquely identify things on the web
12
Johanna Billing: ������
http://viaf.org/viaf/120189360
WHAT IS AN URI? 3. link different things on the web to each other
13
14
PERSISTENT URI’S ENSURE:
§ stability and accessibility of published data over time
§ high ranking in search engines because they are human-readable and do not change
§ enrichment of data via external authorities because they are deliberately shared with others
15
16
17
HOW TO IMPLEMENT PERSISTENT URIS FOR WORKS OF ART?
- Use a good syntax
- Distinguish between work, data and representation
- Reuse existing persistent URIs
- Act as publisher
18
GOOD SYNTAX:
http:// [domain]/ mandatory
[type object]/ optional [type document]/ optional
[identification number] mandatory
GOOD SYNTAX:
http:// smak.be/collection/
work/ id/
A256
NOT A GOOD SYNTAX:
http://www.smak.be/collectie_kunstenaar.php?la=en&id=&i=0&t=&tid=&y=&l=b&kunstenaar_i
d=1608&kunstwerk_id=1490
Distinguish between art work, data and representation
22
Work In collection A
Data B
Data A
Data C …
Representation A
ART WORK:
http://���smak.be/collection/���
work/���id/���
A256
DATA: http://���smak.be/collection/���
work/���data/���A256
REPRESENTATION: http://���smak.be/collection/���
work/ representation/���
A256
WORK, DATA & REPRESENTATION
26
Work http://
smak.be/collection/work/
id/A256
Data B http://
museuminzicht.be/collection/
work/data/ A256
Data A http://
smak.be/collection/ work/data/
A256
Data C http://
museuminzicht.be/collection/
work/data/ A256
…
Representation A http://
smak.be/collection/work/representation/
A256
Reuse existing persistent URIs
Use external authorities for contextual data (institute, artist, date, object name), because: § stuff that is not managed by the museum, probably
already has an identifier somewhere else § they may contain contextual information that the museum
doesn’t have yet § often available as structured data in an open format § support of the Semantic Web idea
27
ARTIST – JOHANNA BILLING
RKDartists: ���http://explore.rkd.nl/explore/
artists/26958
Freebase: ���https://www.freebase.com/
m/0fn5hr
VIAF: ���http://viaf.org/viaf/
120189360
INSTITUTION - SMAK WIKIdata: ���
http://www.wikidata.org/wiki/Q1540707
Act as publisher
§ Persistency can only be ensured by virtue of the publisher of the data. Therefore, museums should take up this responsibility in order to make digital humanities research based on their data possible.
30
THE PROJECT
PERSISTENT IDENTIFICATION PROJECT § 10 fine arts and contemporary art museums -> 10 data sets § Bottom-up approach § Deals with a lot of technical and practical questions § Works with data exports from our partners § Adds new persistent URIs next to existing data § Identifies and links 7 entities in 10 data sets: art work, data,
representation, institute, date, artist, object name
32
OPEN REFINE
33
EXPECTED RESULTS
- show improved retrieval and enrichment in a demonstrator application
- the new PIDs and the PIDs from external authorities integrated in the collection data
- future websites with linked and enriched data (researches get more contextual information, than museums have)
34
35
Data
Representation
Data
Art work
Representation
Data
BENEFITS FOR MUSEUMS
§ makes registration and online publication of data easier! § scientific exposure § more control and user statistics about the online use of
collection data § improves accessibility of internal researches for external
audience
Concrete future plans: § VKC data hub: open data repositories § LUKAS representations hub: access to images of works of art
BENEFITS DIGITAL HUMANITIES
§ the data published by museums are high in ranking (easier to find for researchers)
§ behind the links there is trustworthy information
§ the data are rich in contextual info
§ museum as a switchboard between different data sources
THANK YOU! [email protected]