Current design issues for digital archives
Robert Munro(presented by David Nathan)
Endangered Languages Archive (ELAR), School of Oriental and African Studies, London
2
Outline
1. Introduction
2. Archive architectures
3. Current Issues1. value-adding interaction from ‘end’ users
2. flexibility in access to materials
3. granularity of description of materials
4. Conclusions
3
Introduction – ELAR
Part of the Hans Rausing Endangered Languages Project (HRELP).
Open for deposits since October 2005.In the process of designing and implementing
key systems.
4
Introduction – ELAR
ELAR will be the first language archive that allows users to:add metadata in the language of their choice add new metadata (comments, descriptions, links) to
existing materialstranslate metadata into a language of their choiceselect language preference(s) for viewing existing
metadataadd metadata to archived materials at different levels of
granularity
5
Introduction – current issues
‘End’ users adding value to archive materialswho will moderate such additions?
Flexible support of accesscan an archive explicitly support multilingual users?
Metadata – comments / description of materials:should the granularity of description be at the level of:
files,collections of files,and/or sub-subsections of a file?
6
Archive architectures
Producers Silo
The classic ‘silo’ view of an archive:little more than disaster-proof backup
7
Archive architectures
Silo
The producers are not the only users:different dissemination formats are required…
Dissemination Producers
8
Archive architectures
Silo
The producers are not the only users:different dissemination formats are required…
…for different user communities
Dissemination Producers
Designated communities
9
Archive architectures
Silo
Working formats are not preservation formats:materials may need to be transformed on ingest
Dissemination Designated communities
IngestionProducers
10
Archive architectures
You cannot rigidly preserve digital data:file need to refreshed and migrated to current formats
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
11
Archive architectures
…but the objects, metadata and structures are still backed up in disaster-proof silo’s.
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
12
Archive architectures
Archives need to define three types of ‘packages’ingestion, archive and dissemination:
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
13
Ingestion (Accession) packages
Formats & structures that can be converted to archive formats with minimal effort:open file formatswell-documented structures: XML with schema ideal
The content needs to take into account the many potential uses of the materials:high quality sound and videoa variety of genresdetailed metadata and structural information
14
Dissemination packages
Many potential users of archived materials:researchersspeakers educators publishers
With many different requirements:access to materials by various methodsarchive servicescontinuation of ownership of language materials
15
Current issues – value adding
The current model is fairly uni-directionalbut users can/should add value to archive materials
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
16
Current issues – value adding
Users should be able to add to existing materials:speakers’ comments on contentresults of recent research
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
17
Current issues – value adding
The archive needs to trust certain users to add metadata to existing materials:should the identity of users be recorded / open?should users be able to challenge existing metadata?
Who to trust?depositors cannot moderate all comments on objects,
especially if comments can be in any languagebut can an archive deny a speaker’s request to add
comments to a recording of them speaking?
18
Current issues – flexibility of access
The archive cannot create different dissemination packages for every language and/or user:
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
19
Current issues – flexibility of access
Users should be able to personalize access:language preference(s) for metadatapreference on type of materials
Archive Dissemination
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
afd_34
dfa dfadf
fds fdafds
Designated communities
IngestionProducers
20
Current issues – flexibility of access
Flexibility of search / browse:keyword ‘search engine’ type searchrich relationships between objects for browsinggeographic searchesresearch community specific search
21
Current issues – flexibility of access
Flexibility of language:most metadata in most archives is in Englishshould metadata be multilingual?
22
Current issues – flexibility of access
If a user prefers to speak Quechua, then Spanish, then English:rather than accessing via one interface per
language…
ORbread
Photograph by Juan Pérez Martínez
January 2006
pan
Fotografia tomada por Juan Pérez Martínez
Enero 2006
pan
Fotografia tomada por Juan Pérez Martínez
Enero 2006
OR …
23
If a user prefers to speak Quechua, then Spanish, then English:…users should get all languages at once, according
to availability of data and their preferenceslabel in Quechua:photographer in Spanish:date in English:
Current issues – flexibility of access
t’anta
Fotografia tomada por Juan Pérez Martínez
January 2006
t’anta
Fotografia tomada por Juan Pérez Martínez
January 2006
24
Current issues – granularity
Archives tend to treat archived files as ‘atomic’metadata only refers to files as a whole
What abouta specific comment about a 20 second subsection of
the file? a general comment applying to many files?
25
Current issues – granularity
For example, suppose we have an annotated sound recording of some event:
26
Current issues – granularity
Some metadata is about the file as a whole:date recorded, speakers, title
27
Current issues – granularity
Some metadata is about sub-segments:name of a significant person or place specific linguistic phenomena
28
Current issues – granularity
It is likely that users will want to:add comments to such subsections richly link subsections to other items make unambiguous reference to subsections
At the time of deposit, no one can predict which subsections of files will later be significant:users need to be able to explicitly define subsections
of archive objects
29
Conclusions
Archives are not static repositories:an archive supports materials for multiple different
user communities in parallel
Value-adding interaction:archived materials can be further enriched by users
Flexibility in access to materials:personalizable interaction with archive materials
Granularity of description of materials:user defined granularity of materials
30
Thank you