Top Banner
Digital Preservation NOW Andrew Waugh Senior Technical Advisor Public Record Office Victoria
38

Andrew Waugh

Nov 01, 2014

Download

Documents

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Andrew Waugh

Digital Preservation NOW

Andrew Waugh

Senior Technical Advisor

Public Record Office Victoria

Page 2: Andrew Waugh

Goal of session

• To present practical steps that you can take to preserve digital information now, without having a digital archive

Page 3: Andrew Waugh

Outline of session

• Goal of preservation

• Preserving the bit stream

• Preserving accessibility

• Preserving the context

• Conclusions

Page 4: Andrew Waugh

The goal of preservation

• Ensure access to records as long as they are required

• A record is…– information created, received, and maintained as

evidence and information by an organization or person, in persuance of legal obligations or in the transaction of business(AS ISO 15489.1-2002)

Page 5: Andrew Waugh

The key to records is evidence

• What, where, when, how, who

• Evidence to colleagues (business activity)

• Evidence of accountability (investigations)

• Evidence to courts (legal evidence)

• Evidence to researchers (historical evidence)

Page 6: Andrew Waugh

So what does evidence require?

• That record was produced as part of normal business process (authentic)

• That record can be found & read (accessible)

• That it can be related to the rest of the records (context)

• That it hasn’t been tampered with (integrity)

Page 7: Andrew Waugh

Key issues

• Preserving the bit stream– If you don’t have the bits, you don’t have anything

• Preserving access to the information– In the face of fragile applications

• Preserving the context– The evidence

Page 8: Andrew Waugh

Preserving the bit stream

Page 9: Andrew Waugh

Core issue

• If you don’t have the binary data (files) that makes up the record there you cannot preserve anything

• Problems you need to protect against– Media failures (corruption, crashes)– Technology obsolescence– Human error

Page 10: Andrew Waugh

Basically a solved problem

• A core function of your IT department– Day to day operation of storage systems– Back-up/restore and disaster recovery– Periodic replacement of media and technology

Page 11: Andrew Waugh

Recommendations

• Store on at least two pieces of media, ideally two technologies or (less ideally) two brands

• Store in at least two sites• Information not being accessed must be periodically

checked for corruption• Track individual pieces of media – include brand and

batch• Always use mainstream technology in widespread

use

Page 12: Andrew Waugh

Storage media (disc)

• Default storage choice should be on-line (disc) storage unless massive storage required– e.g. 3 Terabytes RAID 5 ~$4000

• RAID 1 or 6 (or derivatives) to guard against disc failures. RAID 5 is problematic now.

• Expect to replace each disc within 5 years

• External (USB) discs not recommended for long term storage (> 1 year)

Page 13: Andrew Waugh

Storage media (tape)

• Choice when greater storage capacity than economic with disc– Be sure to factor in whole of life costs including media

replacement and operator costs

• Preferred formats LTO Ultrium, IBM 3592, T10000• Tape robots are preferred over manual handling• Get expert advice on tape solutions as these are no

longer common – use only for large organisations• NEVER EVER choose leading edge technology, always

stay within industry standard

Page 14: Andrew Waugh

Storage media (optical)

• Prefer CD-R (phthalocyanine dye)• Can use CD-R (azo dye) or DVD-R, but monitor

carefully• Do not use CD-RW, DVD-RW, or CD-R (cyanine dye)• Use ‘name brands’, and archival quality if possible• Refresh in 2 to 5 years• Unlikely to be generally economic compared with

disc or tape due to high operator cost and low capacity

Page 15: Andrew Waugh

Monitor…

• Recommend statistical sampling of data to– check for corruption of copies (checksums)– deterioration of media

• Technology watch to guard against obsolete media– plan for media refresh every 2 to 10 years

• Track individual pieces of media (if used)– Ensure that none are lost– Ensure that all are tested and refreshed

Page 16: Andrew Waugh

Back-up & disaster recovery

• Ensure that– Your IT organisation has both a back-up and

disaster recovery regime– It is effective (periodically test restoration)

Page 17: Andrew Waugh

Preserving accessibility

Page 18: Andrew Waugh

Software fragility

• Without software to interpret and display the content, the data is lost– Software may not run on the current version of the

operating system or current computer– Current software version may not accurately deal

with files from older versions – You may not have the required software

Page 19: Andrew Waugh

Do nothing option

• So far has worked because backwards compatibility is better than we thought – Operating systems continue to support older

programs (Windows, Unix/Linux)– Modern programs seem to have good support for

files from older versions– This may not last forever…

Page 20: Andrew Waugh

If you are going to do nothing…

• Perform a risk analysis– Survey your holdings to identify and quantify file formats

• versions, if possible, ages if not

– Consider risk of loss of access• Use criteria from normalisation section

– Identify high value holdings

• Monitor software trends (is risk increasing?)• Identify contingency plans• Influence users to use lower risk formats

Page 21: Andrew Waugh

Normalisation option

• Proactively convert formats to a long term preservation format (LTPF)

• This is a format that is likely to be usable for the forseeable future– Can find replacement software to render data– Can find software to migrate from LTPF to new format

• Library of Congress sustainability factors– http://www.digitalpreservation.gov/formats/

Page 22: Andrew Waugh

Characteristics of a good LTPF

• Supports critical features of your data• Published file format specification• Independent implementations• Wide community adoption• Simple• Formal standard• Public domain• Low risk conversion

Page 23: Andrew Waugh

If you normalise

• Don’t jump out of the frying pan– Still need to do the analysis presented for ‘do

nothing case’– Just fewer formats

• Develop test regime to test conversion into nominated format– Suite of ‘typical’ documents illustrating critical

features

Page 24: Andrew Waugh

LTPF suggestions

• Documents– PDF/A, ODF

• Images– TIFF, JPEG2000, JPEG (if already in JPEG)

• Video– MPEG2 or MPEG4

Page 25: Andrew Waugh

Normalisation challenges

• Many types of data have no suitable LTPF (e.g. CAD/GIS)

• Long tail of formats (never be able to assign a LTPF for all types of digital object)

• Loss of characteristics in the normalisation

• Increasing complexity of digital objects (i.e. formats embedded within formats)

Page 26: Andrew Waugh

Digital rights management

• DRM systems are designed to control (prevent) access to digital objects– Owner of digital object removes right of access– May not permit access even though it is required (e.g. investigations)– DRM system ceases to exist

• DRM systems do not recognise an organisation’s right to use their records

• Trusted Computing and Digitial Rights Management Principles and Policies, NZESC– http://www.e.govt.nz/policy/tc-and-drm/principles-policies-06/tc-drm-

0906.pdf

Page 27: Andrew Waugh

Is it evidence?

(Context)

Page 28: Andrew Waugh

Core Issues

• If you cannot find it, it does not exist

• If you can find it, and cannot understand the context, it is meaningless– Users are interested in the story, not a document

• If you cannot show its authenticity, integrity, and context, it may have low evidential weight

Page 29: Andrew Waugh

It’s all basic records management

• Create the record as part of the business process (authenticity)– This includes putting it aside

• Putting the record in its context– Tell the story – who, what, where and when

• Show that the record has not been subsequently modified– Audit log

Page 30: Andrew Waugh

Key requirements

• Making sure that records are created in their context (business issue)

• Having someplace to put the records and capture their context– Electronic Document & Records Management

System (EDRMS)– Classification system

Page 31: Andrew Waugh

If you do not have an EDRMS?

• Do whatever you can…

• Set up classification system in– Email system– Corporate file server

• Good idea even you plan to get an EDRMS– It gets everyone used to using a classification

system

Page 32: Andrew Waugh

Why is metadata Important?

• Who, what, where and when is answered by metadata associated with record– Captured (ideally) by system when record is

created– Entered by user

• Many different metadata standards

Page 33: Andrew Waugh

NAA/ANZ metadata standard

• Proposed basis for an Australian recordkeeping standard

• Australian Government Recordkeeping Standard version 2.0– http://www.naa.gov.au/Images/AGRkMS_Final%2

0Edit_16%2007%2008_Revised_tcm2-12630.pdf

Page 34: Andrew Waugh

Minimum metadata to be kept

• Identifier (unique id referring to this object)• Name (human readable tag)• Start date (creation date)• Contextual link (relation with file, series)• Change history (demonstrating integrity)• Disposal (when and how to dispose of record)• Extent (size)• Agent (organisation or person associated with

record)

Page 35: Andrew Waugh

What can you do now - storage

• Make sure that your organisation can preserve the bits– Survey holdings of media to discover the extent of your

problem– Move records off unmanaged, obsolete, deteriorating

media– Ensure back-up and disaster recovery systems are in

place and working– Sample records to detect corruption and decay– Plan to migrate to new technology

Page 36: Andrew Waugh

What can you do now – access

• Make sure that your organisation can turn the files into something a human can understand– Survey holdings of records to understand what

formats you have and the importance of the records

– Perform a risk assessment on the formats– Choose an LTPF and normalise high risk formats– Encourage use of LTPF for business

Page 37: Andrew Waugh

What can you do now – context

• Make sure that digital objects are records– Organise the objects so that they have a context

(classification)– Move towards an EDRMS or business application

that captures the records, preserves their context, and protects their integrity

Page 38: Andrew Waugh

Questions?