Digital Preservation Steps 1 & 2: Identify & Select
Mar 31, 2015
Digital PreservationSteps 1 & 2: Identify & Select
Steps
Identify - what digital content do you have?
Select - what portion of that content will be preserved?
Store - what issues are there for long term storage?
Protect - what steps are needed to protect your digital content?
Manage - what provisions are needed for long-term management?
Provide - what considerations are there for long-term access?
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
identify
select
storeprotectmanage
provide
DPOE Baseline Concepts
DPOE Baseline Modules: Identify, version 3.0
Problem Summary: Preservation is a resource commitment, so we need to
effectively plan for our current and future preservation needs, because not all digital
content will be preserved.
Solution: An explicit inventory is the best way to
identify content
How will an inventory help?Good preservation decisions are based on a
deep understanding of the possible content to be preserved
Possible to preserve
Actually preserved
All Content
Inventory Considerations
• An inventory’s content is more important than style and format however…
• Inventory results should preferably be: – Scalable: content will be added during Select– Available: accessible to team, managers, others– Usable: simple format to sort, list, etc.– Current: update periodically– Electronic: Needs to be a dynamic format– Documented: an inventory needs to be captured
DPOE Baseline Modules: Identify, version 3.0
Inventory Tips
• Use available, familiar software to get started – What software or tools do you already have?– What free or open source tools might be useful?
DPOE Baseline Modules: Identify, version 3.0
Be consistent, comprehensive, and concise
Inventory Scope
• What content are we already preserving?• What other digital content do we have?• What content do/will our producers create?• What content are we required to keep?• What content do we need to review?
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Exercise
• Where is all our content?
• At your tables, think about the digital content at your library, where it’s located, and what kinds of files there are
Level of Detail
• Inventories can be general to detailed • Determine appropriate level of detail for you• Factors in determining level of detail:– Extent of content to be inventoried– Nature and location of content to be inventoried– Resources available to complete inventory– Timeframe, deadlines for completing inventory
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Content Categories
Inventories should include all relevant, e.g.: • Institutional records • Special collections & Archives• Scholarly content – licensed and open• Research data• Web content
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Format Types
An inventory should identify format types within categories of content - examples:
Indicate the range of file types when possible
• Images• Video• Audio • Text
• Maps/geospatial • Drawings• Web content• Structured data
DPOE Baseline Modules: Identify, version 3.0
Date Considerations
Inventories should note:• Date of inventory – and updates to it• Date of files – when possible• Dates covered in content – even approximate• Date created/received – if relevant, possible
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Location Issues
Locations of content are important – consider:• Method to specify online/offline location• General location – e.g., with us, with creator• Ability to change locations as content moves• Method storage systems use to note location
Be clear enough without going to extremes…
DPOE Baseline Modules: Identify, version 2.0, Nov 2011
Location
Cloud Platform
Sample Basic Inventory
Category: Special Collections - Slides Title/Description: Circus photographs Type: Images, digitizedFormat: TIFFExtent: 242 GB, 2250 images Location: Server (Systems), CDs (Digital Center) Coverage dates: early 1950s, Creation date: 2010 - 2012, Inventoried: by Andrew Huot, November, 2013
Identify - what digital content do you have? Select – what portion of that content will be preserved?Store - what issues are there for long term storage? Protect - what steps are needed to protect your digital content? Manage - what provisions are needed for long-term management? Provide - what considerations are there for long-term access?
DPOE Baseline Modules: Select, version 2.0, Nov 2011
Steps
Why be selective?
• Storage may be cheap, management is not… especially over time
DPOE Baseline Modules: Select, version 3.0
1 Tb Hard drive= $100 IT Department = $100 hour
Why be selective?• Quality of content
DPOE Baseline Modules: Select, version 3.0
Why be selective?
• Discovery and dissemination services … scale, scope, performance, sustainability
DPOE Baseline Modules: Select, version 3.0
Why be selective?• Match mission to content: What kind
of content would this organization preserve?
DPOE Baseline Modules: Select, version 3.0
Cottonwood Foundation, a charitable grant-making organization, is dedicated to promoting empowerment of
people, protection of the environment, and respect for cultural diversity.
Terminology for Select
Different terms in different domains:•Archives – appraisal and scheduling•Libraries – e.g., selection•Museums – e.g., acquisition•Records Management – vital and non-vital•Commercial media - channelization
DPOE Baseline Modules: Select, version 3.0
Steps to Select
DPOE Baseline Modules: Select, version 3.0
Review your potential digital content
Implement your decisions
Document (and preserve) selection decisions
Define and apply selection criteria
Tons of Review Priorities•Most significant (producer, content)•Most extensive•Most requested •Easiest (e.g., most familiar)•Oldest (possible historical importance)•Newest (possible immediate interest) •Mandate (local, legislation, etc.)
DPOE Baseline Modules: Select, version 3.0
Another layer: Audience / Stakeholders
DPOE Baseline Modules: Select, version 3.0
Quick Review Tool
Stop if or when the answer is ‘no’…1.Content – does the content have value (consider
stakeholders)? – does it fit your scope?
2.Technical– is it feasible for you to preserve the content?
3.Access– is it possible to make the content available?
DPOE Baseline Modules: Select, version 3.0
To Select or Not to Select?
Stakeholders• Archive researchers• Television viewers• Stockholders• Web site users
Policy questions• Is held in other archives• Does it have use value beyond this one
project?• Does it fit our policy?
DPOE Baseline Modules: Select, version 3.0
Selection can
Project Management • Treat selection as an ongoing structured
project to plan and coordinate the process
DPOE Baseline Modules: Select, version 3.0
Augment InventoryAdd Descriptions – more granular– Not item level, but enough to specify categories
(additional information from the creator)
DPOE Baseline Modules: Select, version 3.0
Oh by the way, that whole part about me being Italian in my diary is only
half true.
Augment InventorySupplement inventory from Identify Extent– How much content is there/will there be? (number of
files, megabytes, number of subfolders) – When will content no longer be active/disposition?
DPOE Baseline Modules: Select, version 3.0
A file full of footage has several extents: Number of files, length of footage, number of
gigabytes, number of reels digitized, number of subseries, etc.
Outcomes for identifying & selecting content to preserve
. Expand inventories of content
. Permit agreements with producers such as retention schedules, acquisition lists, submission agreements
Objectives . Gain control of possible content for planning. Develop a sustainable program
. Identify potential digital content you may need to preserve. Treat the inventory as a management tool that grows as your program grows. Use it as a planning tool to prepare
.. e.g., staff, training, annual growth. Provides a basis for acquiring content, defining submission agreements, plans