Top Banner
Preservation metadata Andrew Waugh Senior Manager, Standards and Policy Public Record Office Victoria
38
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Andrew waugh

Preservation metadata

Andrew WaughSenior Manager, Standards and Policy

Public Record Office Victoria

Page 2: Andrew waugh

Structure of the talk

• What is preservation metadata?• Recordkeeping metadata in theory• NAA/ANZ recordkeeping metadata standard• PREMIS – standard for preservation metadata• Practical reading and implementing tips• Conclusions

Page 3: Andrew waugh

What is preservation?

• The ability to be able to access content for as long as it is required

• Access means– Being able to find the content– Extract information from the content– Understand the context of the content– Be confident of the history of the content

Page 4: Andrew waugh

Preservation metadata

• Preservation metadata is the information necessary to maintain access to content

• Difference between short and long term access is one of degree of metadata, not kind

• As preservation professionals, we are rarely interested in the content, just managing it. Preservation metadata is the basic information that we use to do our job

Page 5: Andrew waugh

Examples of preservation metadata

• Identifier• Creation date• Title• History information• Relationship between objects• Data formats

Page 6: Andrew waugh

Recordkeeping Metadata

• The archival profession has been developing recordkeeping (=preservation) metadata for around a decade

• This work provides a useful framework to think about preservation and metadata

Page 7: Andrew waugh

RK Metadata Standards

• ISO 20381 Information and documentation –Records management processes – Metadata for records – Part 1: Principles– Part 2: Conceptual and implementation issues

• National Archives of Australia (and Archives New Zealand) - Recordkeeping Metadata Standard Version 2.0– http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16

%2007%2008_Revised_tcm2-12630.pdf• Forthcoming Australian/New Zealand Standard

Page 8: Andrew waugh

Metadata from a records view

• Records are content, context, and structure• Record management metadata is data

describing the context, content, and structure of records and their management through time (ISO 15489-1:2001, 3.12)

• Recordkeeping metadata is the key to providing access (and hence preservation)

• In practice, metadata is everything except the actual content of the record

Page 9: Andrew waugh

Purpose of recordkeeping metadata

• The purpose of recordkeeping metadata includes– Protecting records as evidence– Ensuring their accessibility and usability through time– Facilitating the ability to understand records– Helping ensure the authenticity, reliability and integrity of

records– Supporting and managing access, privacy, and rights– Supporting the migration of records from one

(preservation) system to another

Page 10: Andrew waugh

Metadata at record capture

• Records are captured into a system, and metadata is created/captured with them

• This metadata documents– Environment in which records were created– Purpose or business activity being undertaken– Relationship with other records or aggregations– Physical or technical structure of the record– Logical structure of the record

Page 11: Andrew waugh

Metadata after record capture

• Metadata captured after record creation documents what happened to a record over time– demonstrates authenticity, reliability, usability, and

integrity)

• Answers the basic questions of who, what, when, where, why

Page 12: Andrew waugh

Metadata after disposal

• Metadata is a record itself, and some parts may need to be kept after the record has been disposed of to account for their existence, management, and disposition

Page 13: Andrew waugh

Four entity model

• Modern Australian recordkeeping metadata models normally are expressed in terms of entities– Records (the objects to be preserved: record, file,

series…)– Agents (people who create and use the records)– The business transacted– Mandates (the rules governing the business)

Page 14: Andrew waugh

Four entity model

• ISO23081-2 s6.1

Page 15: Andrew waugh

One, two, three, four entity models

• The four entity model can be flattened to facilitate implementation– A system could only store one entity (record)

which contains metadata for agents, business, and mandates

– Practical because most metadata is captured at creation, subsequent changes in relationships or information less relevant

Page 16: Andrew waugh

Metadata associated with an entity

• ISO23081-2 s6.1

Page 17: Andrew waugh

Identity metadata

• Distinguishes entity from all other entities in the domain– Entity type (e.g. record, agent)– Aggregation (e.g. file, record)– Registration Identifier (the actual identifier)

Page 18: Andrew waugh

Description metadata

• Describes the entity to allow determination if this is the entity sought– Title– Classification– Abstract– Place– External Identifiers

• WARNING – description elements are normally business specific

Page 19: Andrew waugh

Use metadata

• Assists long-term access to the entity– Technical environment– Rights (who may legal use it & under what

conditions)– Access (access control)– Language– Integrity– Documentary form

Page 20: Andrew waugh

Event plan

• Allows the entity to be managed• Consists of management actions that are

planned to occur in the future– Appraisal (To keep or not)– Disposal (Implementation of appraisal decision)– Preservation– Access Control (Changes to)– Rights (Changes to)

Page 21: Andrew waugh

Event history

• Documents the trail of past events• Who, what, when, why

– Event identifier– Event date/time– Event type– Event description– Event relation (mandate, agent)

Page 22: Andrew waugh

Relation

• Links two (or more) entities• Implicitly bi-directional, but need not be

implemented this way• Relationships often have a time span

– Entity Identifiers (from, to)– Relationship type– Relationship description– Relationship date range

Page 23: Andrew waugh

NAA/ANZ metadata standard

• Same content, two standards• NAA version

– Recordkeeping Metadata Standard Version 2.0– http://www.naa.gov.au/Images/AGRkMS_Final%2

0Edit_16%2007%2008_Revised_tcm2-12630.pdf– Based on five entities (Record, Agent, Business,

Mandate, Relationship)– Defines 26 elements with 44 sub-elements– Includes extensive element schemes

Page 24: Andrew waugh

NAA/ANZ ElementsEntity TypeCategoryIdentifier*Name*Date RangeDescription

All Entities

Jurisdiction*Security Class*Security Caveat*Rights*Language*Coverage*Keyword*Disposal*FormatExtent*MediumIntegrity CheckLocation*Document FormPrecedence

Record

Jurisdiction*Permissions*Contact*Position*Language*

Agent

Jurisdiction*Security Class*Permissions*

Business

Jurisdiction*Security Class*Security Caveat*Coverage*

Mandate

Related Entity*Change History*

Relationshp

Mandatory ElementConditional ElementOptional Element

Page 25: Andrew waugh

Future Australian Standard

• Work is in progress on an Australian Standard for recordkeeping metadata

• Based on the NAA/ANZ metadata standard• Focus on relationships

Page 26: Andrew waugh

PREMIS

• Preservation metadata is the information a respository uses to support the digital preseration process

• Supports the viability, renderability, understandability, authenticity, and identity of digital objects

• Built on OAIS reference model• Data dictionary & supporting materials

– http://www.loc.gov/standards/premis/

Page 27: Andrew waugh

PREMIS scope

• Not intended to define all preservations elements, only those that most repositories are likely to need to know in order to support digital preservation

• Excludes– Format specific metadata (even for a class of format)– Repository specific metadata and business rules– Descriptive metadata– Detailed information about media or hardware– Information about agents, apart from minimum required for

identification– Information about rights and permissions, except those

that directly affect preservation functions

Page 28: Andrew waugh

PREMIS Data Model

• From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf

Page 29: Andrew waugh

PREMIS Entities

• Intellectual Entity – set of content that is a single intellectual unit – has no metadata in PREMIS

• Object Entity – things actually stored in a repository– Representation Object – collection of all file objects

necessary to represent an intellectual entity– File Object – discrete object on a computer file system– Bitstream Object – portion of a file

• Event Entity – contains the history of an Object• Rights Entity – rights and permissions about object• Agent Entity – actors involved in events or rights

Page 30: Andrew waugh

Elements for Object Entities

• Unique Identifier• Fixity information• Size• Format• Original Name• Creators• Inhibitors (things

designed to prevent use)

• Significant properties (aspects that must be preserved)

• Environment (infrastructure required to use)

• Storage media• Digital signatures• Relationship with other

entities

Page 31: Andrew waugh

NAA/ANZ vs PREMIS

• NAA/ANZ– Recordkeeping is about

relationships, so includes the context of objects which is often necessary to understand the object

– Documents the management plan for the object

• PREMIS– Deliberately focuses on

preserving the files that form a digital object –context is important, but not documented

– Documents critical information necessary to use objects

Page 32: Andrew waugh

Reading metadata schemas

Don’t panic at the length…

Page 33: Andrew waugh

General observations

• Most metadata schemes are lengthy, but contain relatively little information

• If you understand the typical structure, it is easy to quickly pick out the information you need

• Metadata schemes tend to be aspirational –what the drafters thought you should do, often beyond what can do or have to do

Page 34: Andrew waugh

Metadata schemes

• Typical metadata schemes contain– Entities (i.e. objects modelled)

• Definition• Lists valid elements

– Elements (i.e. specific pieces of information)• Definition• Mandatory, optional, conditional flag• Repeatable or not• Structure (child elements)

– Element schemas (i.e. controls over the values that can be used)

• Lists of valid values (e.g. States)• Format controls (e.g. dates)

Page 35: Andrew waugh

Implementation

• Metadata schemes are information models, not implementation instructions

• Adopting a scheme means that your implementation has the– mandatory elements– conditional elements (if relevant)– (perhaps) some of the optional elements– The element structure is correct

• Metadata schemes are often associated with a representation standard (e.g. in XML)– Still not an implementation – often just for exchange

Page 36: Andrew waugh

Conclusions

• Preservation metadata is simply the information that preservation professionals use to ensure continued access to objects

• What is viewed as essential depends on your discipline (what features is it necessary to preserve?)– E.g. archivists are concerned about context,

librarians less so

Page 37: Andrew waugh

Conclusions (2)

• Typical preservation metadata– Identity information– Technical details and

organisation of the objects to be preserved

– Rights and access– History of object

• Other common metadata– Description– Management Plans– Relationships between

objects

Page 38: Andrew waugh

Conclusions (3)

• You only have to implement the logical model and the mandatory elements

• Standards are usually aspirational – include metadata that is nice to have, but not essential

• Specific representations (e.g. XML) are for data exchange, not how you must implement them internally