Top Banner
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010 Twitter hashtag #dprc (digital preservation repository course)
22

Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Mar 28, 2015

Download

Documents

Timothy Snyder
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Digital Preservation Tools for Repository ManagersA practical course in five parts

presented by the KeepIt project in association with

Module 4, Putting storage, format management and preservation planning in the repositoryUniversity of Southampton, 18-19 March 2010

Twitter hashtag #dprc (digital preservation repository course)

Page 2: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Course structure• Module 1. Organisational issues Scoping, selection, assessment,

institutional parameters (19 January)• Module 2. Costs Lifecycle costs for managing digital objects, based on

the LIFE approach, and institutional costs (5 February)• Module 3. Description Describing content for preservation: provenance,

significant properties and preservation metadata (2 March)

• Module 4. Preservation workflow tools available in EPrints for format management, risk assessment and storage, and linked to the Plato planning tool from Planets (TODAY)

• Module 5. Trust (by others) of the repository’s approach to preservation; trust (by the repository) of the tools and services it chooses (30th March)

Page 3: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Tools this module

• Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton

• Plato, preservation planning tool from the Planets project, Andreas Rauber and Hannes Kulovits, TU Wien

Page 4: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Steve Jobs launches Apple iPad

Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/

Page 5: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Steve Jobs launches Apple iPad

Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/

“75 million people already own iPod Touches and iPhones. That's all people who already know how to use the iPad.”

Page 6: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

Page 7: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

AnalyseCheck Action

• Migration• Emulation• Storage selection

• Format identification,

versioning• File validation

• Virus check• Bit checking and

checksum calculation

Toolse.g. DROID

JHOVEFITS

Preservation planningCharacterisation:Significant properties and technical characteristics, provenance, format, risk factors

Risk analysis

ToolsPlato (Planets)PRONOM (TNA)P2 risk registry (KeepIt)INFORM (U Illinois)KB

Preservation workflow

Page 8: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

1000 Ubiquity: degree of adoption of the format1001 Support: number of tools available which can access the format1002 Disclosure: extent to which the format documentation is publicly disclosed1003 Document Quality: completeness of the available documentation1004 Stability: speed and backwards-compatibility of version change1005 Ease of identification: ease with which the format can be identified1006 Ease of validation: ease with which the format can be validated1007 Lossiness: does the format use lossy compression1008 Intellectual property rights: whether or not the format is encumbered by IPR1009 Complexity: degree of content or behavioural complexity supported

Format risks

From PRONOM documentation (The National Archives), July 2008

Page 9: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Format risksWord vs PDF TIFF vs JPEG XML vs PDF

1000 Ubiquity 1 1 1

1001 Support 1 1

1002 Disclosure

1003 Document Quality

1004 Stability 1 1

1005 Ease of identification1006 Ease of validation 1 1

1007 Lossiness 1 1

1008 Intellectual property rights

1

1009 Complexity 1 1 1

The WINNER is PDF TIFF XML

Page 10: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

A group task on format risks1. Choose two formats to compare (e.g. Word vs PDF,

Word vs ODF, PDF vs XML, TIFF vs JPEG)2. By working through the (surviving) list of format risks

select a winner (or a draw) between your chosen formats for each risk category (1 point for win)

3. Total the scores to find an overall winning format

4. Suggest one reason why the winning format using this method may not be the one you would choose for your repository

Page 11: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

Page 12: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties

Page 13: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

13

InSPECT SP Assessment Framework•Builds on Gero’s Function-Behaviour-Structure framework•FBS developed to assist engineers/designers to create & redesign artefactsThree categories:• Function: The design intention or purpose that is

performed.• Behaviour: The epistemological outcome derived

from the function & structure obtained by the stakeholder• Structure: The structural elements of the Object

that enables stakeholder to perform behaviour.•Artefact construction is product of designated function.•Behaviour is result of interaction between Function & Structure

Page 14: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

14

Exercise overview•Analyse the content of an email

• Analyse structure of email message• Determine purpose that each technical property performs

•Consider how email will be used by stakeholders• Identify set of expected behaviours• Classify set of behaviours into functions for recording

Page 15: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

15

Determine expected behaviours• What activities would a user – any

type of stakeholder – perform when using an email?

• Draw upon list of property descriptions performed in the previous step, formal standards and specifications, or other information sources.

Task 2:Identify the type of actions that a user would be able to perform using the email (Groups. 15 mins).

• E.g. Establish name of person who sent email

• E.g. May want to confirm that email originated from stated source.

Analyse structureIdentify purpose of technical properties

Determine expected behaviours

Associate structure with each function

Classify behaviours into functions

Review & finaliseSelect object type

for analysis

Recipient local-part

Behaviour Structure

Recipient domain-part

Trace-route

Recipient display-name

Sender local-part

Sender domain-part

Sender display-name

Message-id

references

In-reply-to

Body text colour

Body background

strikethrough

underline

Paragraph

Line break

Message text

subject

Page 16: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

16

1.3 cont. Categories of propertiesFive high-level categories

•Content e.g. character count

•Context e.g. date of creation

•Rendering e.g. bit depth

•Structure e.g. e-mail attachments

•Behaviour e.g. hyperlinks

Page 17: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

17

•Identify Stakeholders•Creator – view, annotate• Researcher corresponds during research with colleagues, peers, administrators etc.

•Recipient – reuses content• Student wants to understand research lifecycles by studying real-world practice

•Custodian – evidential chain• Maintains permanent email record for externally-funded projects, alongside data and eprint outputs

Select object type(s) for analysis

Determine actual behaviours

Classify behaviours into set of functions

Assign acceptablevalue boundaries

Review & finaliseIdentify stakeholder Cross-match functions

Page 18: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

Page 19: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the

changes over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

Page 20: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-

behaviour-structure (FBS) framework, and classifying the functions of formatted emails

– We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the changes

over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

• Provenance in action: transmission and recording

Page 21: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Provenance: a numbers game

• Transmission: recording vs word-of-mouth• Identifying what is significant about the information to be transmitted• Can be self-correcting!

Page 22: Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting.

Some revision from KeepIt Module 3• Preservation workflow

– Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.

• Significant properties– We considered which characteristics might be significant using the function-behaviour-

structure (FBS) framework, and classifying the functions of formatted emails– We recognised that assessment of behaviour, and so of significance, can vary according

to the viewpoint of the stakeholder – e.g. creator, user, archivist

• Documentation– We looked at two means to document these characteristics, and the changes over time1. Broad and established (PREMIS)2. Focussed, and work-in-progress (Open Provenance Model)

• Provenance in action: transmission and recording– Through a simple game we learned that if we don’t recognise the necessary properties

at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with