Transcript

Preservation metadata

Andrew WaughSenior Manager, Standards and Policy

Public Record Office Victoria

Structure of the talk

• What is preservation metadata?• Recordkeeping metadata in theory• NAA/ANZ recordkeeping metadata standard• PREMIS – standard for preservation metadata• Practical reading and implementing tips• Conclusions

What is preservation?

• The ability to be able to access content for as long as it is required

• Access means– Being able to find the content– Extract information from the content– Understand the context of the content– Be confident of the history of the content

Preservation metadata

• Preservation metadata is the information necessary to maintain access to content

• Difference between short and long term access is one of degree of metadata, not kind

• As preservation professionals, we are rarely interested in the content, just managing it. Preservation metadata is the basic information that we use to do our job

Examples of preservation metadata

• Identifier• Creation date• Title• History information• Relationship between objects• Data formats

Recordkeeping Metadata

• The archival profession has been developing recordkeeping (=preservation) metadata for around a decade

• This work provides a useful framework to think about preservation and metadata

RK Metadata Standards

• ISO 20381 Information and documentation –Records management processes – Metadata for records – Part 1: Principles– Part 2: Conceptual and implementation issues

• National Archives of Australia (and Archives New Zealand) - Recordkeeping Metadata Standard Version 2.0– http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16

%2007%2008_Revised_tcm2-12630.pdf• Forthcoming Australian/New Zealand Standard

Metadata from a records view

• Records are content, context, and structure• Record management metadata is data

describing the context, content, and structure of records and their management through time (ISO 15489-1:2001, 3.12)

• Recordkeeping metadata is the key to providing access (and hence preservation)

• In practice, metadata is everything except the actual content of the record

Purpose of recordkeeping metadata

• The purpose of recordkeeping metadata includes– Protecting records as evidence– Ensuring their accessibility and usability through time– Facilitating the ability to understand records– Helping ensure the authenticity, reliability and integrity of

records– Supporting and managing access, privacy, and rights– Supporting the migration of records from one

(preservation) system to another

Metadata at record capture

• Records are captured into a system, and metadata is created/captured with them

• This metadata documents– Environment in which records were created– Purpose or business activity being undertaken– Relationship with other records or aggregations– Physical or technical structure of the record– Logical structure of the record

Metadata after record capture

• Metadata captured after record creation documents what happened to a record over time– demonstrates authenticity, reliability, usability, and

integrity)

• Answers the basic questions of who, what, when, where, why

Metadata after disposal

• Metadata is a record itself, and some parts may need to be kept after the record has been disposed of to account for their existence, management, and disposition

Four entity model

• Modern Australian recordkeeping metadata models normally are expressed in terms of entities– Records (the objects to be preserved: record, file,

series…)– Agents (people who create and use the records)– The business transacted– Mandates (the rules governing the business)

Four entity model

• ISO23081-2 s6.1

One, two, three, four entity models

• The four entity model can be flattened to facilitate implementation– A system could only store one entity (record)

which contains metadata for agents, business, and mandates

– Practical because most metadata is captured at creation, subsequent changes in relationships or information less relevant

Metadata associated with an entity

• ISO23081-2 s6.1

Identity metadata

• Distinguishes entity from all other entities in the domain– Entity type (e.g. record, agent)– Aggregation (e.g. file, record)– Registration Identifier (the actual identifier)

Description metadata

• Describes the entity to allow determination if this is the entity sought– Title– Classification– Abstract– Place– External Identifiers

• WARNING – description elements are normally business specific

Use metadata

• Assists long-term access to the entity– Technical environment– Rights (who may legal use it & under what

conditions)– Access (access control)– Language– Integrity– Documentary form

Event plan

• Allows the entity to be managed• Consists of management actions that are

planned to occur in the future– Appraisal (To keep or not)– Disposal (Implementation of appraisal decision)– Preservation– Access Control (Changes to)– Rights (Changes to)

Event history

• Documents the trail of past events• Who, what, when, why

– Event identifier– Event date/time– Event type– Event description– Event relation (mandate, agent)

Relation

• Links two (or more) entities• Implicitly bi-directional, but need not be

implemented this way• Relationships often have a time span

– Entity Identifiers (from, to)– Relationship type– Relationship description– Relationship date range

NAA/ANZ metadata standard

• Same content, two standards• NAA version

– Recordkeeping Metadata Standard Version 2.0– http://www.naa.gov.au/Images/AGRkMS_Final%2

0Edit_16%2007%2008_Revised_tcm2-12630.pdf– Based on five entities (Record, Agent, Business,

Mandate, Relationship)– Defines 26 elements with 44 sub-elements– Includes extensive element schemes

NAA/ANZ ElementsEntity TypeCategoryIdentifier*Name*Date RangeDescription

All Entities

Jurisdiction*Security Class*Security Caveat*Rights*Language*Coverage*Keyword*Disposal*FormatExtent*MediumIntegrity CheckLocation*Document FormPrecedence

Record

Jurisdiction*Permissions*Contact*Position*Language*

Agent

Jurisdiction*Security Class*Permissions*

Business

Jurisdiction*Security Class*Security Caveat*Coverage*

Mandate

Related Entity*Change History*

Relationshp

Mandatory ElementConditional ElementOptional Element

Future Australian Standard

• Work is in progress on an Australian Standard for recordkeeping metadata

• Based on the NAA/ANZ metadata standard• Focus on relationships

PREMIS

• Preservation metadata is the information a respository uses to support the digital preseration process

• Supports the viability, renderability, understandability, authenticity, and identity of digital objects

• Built on OAIS reference model• Data dictionary & supporting materials

– http://www.loc.gov/standards/premis/

PREMIS scope

• Not intended to define all preservations elements, only those that most repositories are likely to need to know in order to support digital preservation

• Excludes– Format specific metadata (even for a class of format)– Repository specific metadata and business rules– Descriptive metadata– Detailed information about media or hardware– Information about agents, apart from minimum required for

identification– Information about rights and permissions, except those

that directly affect preservation functions

PREMIS Data Model

• From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf

PREMIS Entities

• Intellectual Entity – set of content that is a single intellectual unit – has no metadata in PREMIS

• Object Entity – things actually stored in a repository– Representation Object – collection of all file objects

necessary to represent an intellectual entity– File Object – discrete object on a computer file system– Bitstream Object – portion of a file

• Event Entity – contains the history of an Object• Rights Entity – rights and permissions about object• Agent Entity – actors involved in events or rights

Elements for Object Entities

• Unique Identifier• Fixity information• Size• Format• Original Name• Creators• Inhibitors (things

designed to prevent use)

• Significant properties (aspects that must be preserved)

• Environment (infrastructure required to use)

• Storage media• Digital signatures• Relationship with other

entities

NAA/ANZ vs PREMIS

• NAA/ANZ– Recordkeeping is about

relationships, so includes the context of objects which is often necessary to understand the object

– Documents the management plan for the object

• PREMIS– Deliberately focuses on

preserving the files that form a digital object –context is important, but not documented

– Documents critical information necessary to use objects

Reading metadata schemas

Don’t panic at the length…

General observations

• Most metadata schemes are lengthy, but contain relatively little information

• If you understand the typical structure, it is easy to quickly pick out the information you need

• Metadata schemes tend to be aspirational –what the drafters thought you should do, often beyond what can do or have to do

Metadata schemes

• Typical metadata schemes contain– Entities (i.e. objects modelled)

• Definition• Lists valid elements

– Elements (i.e. specific pieces of information)• Definition• Mandatory, optional, conditional flag• Repeatable or not• Structure (child elements)

– Element schemas (i.e. controls over the values that can be used)

• Lists of valid values (e.g. States)• Format controls (e.g. dates)

Implementation

• Metadata schemes are information models, not implementation instructions

• Adopting a scheme means that your implementation has the– mandatory elements– conditional elements (if relevant)– (perhaps) some of the optional elements– The element structure is correct

• Metadata schemes are often associated with a representation standard (e.g. in XML)– Still not an implementation – often just for exchange

Conclusions

• Preservation metadata is simply the information that preservation professionals use to ensure continued access to objects

• What is viewed as essential depends on your discipline (what features is it necessary to preserve?)– E.g. archivists are concerned about context,

librarians less so

Conclusions (2)

• Typical preservation metadata– Identity information– Technical details and

organisation of the objects to be preserved

– Rights and access– History of object

• Other common metadata– Description– Management Plans– Relationships between

objects

Conclusions (3)

• You only have to implement the logical model and the mandatory elements

• Standards are usually aspirational – include metadata that is nice to have, but not essential

• Specific representations (e.g. XML) are for data exchange, not how you must implement them internally

top related