Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn Colvin Abhishek Salve Stephen Abrams UC Curation Center California Digital Library Digital Library Federation Forum Baltimore, October 31-November 2, 2011
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unified Digital Format Registrya semantic registry for digital preservation
UDFR: A Semantic Registry for Format Representation Information
Lisa Dawn ColvinAbhishek Salve
Stephen Abrams
UC Curation CenterCalifornia Digital Library
Digital Library Federation ForumBaltimore, October 31-November 2, 2011
Unified Digital Format Registrya semantic registry for digital preservation
Outline
WhatWhyHowWhen
Unified Digital Format Registrya semantic registry for digital preservation
Why formats?
“Format” is the dividing line between bits and informationffd8ffe000104a46494600010201008300830000ffed0fb050686f746f73686f7020332e30003842494d03e90a5072696e7420496e666f000000007800000000004800480000000002f40240ffeeffee030602520347052803fc00020000004800480000000002d802280001000000640000000100030...
Unified Digital Format Registrya semantic registry for digital preservation
Why formats?
There are many necessary preservation activities that can be usefully performed on bits qua bits
But to preserve information you most act on formatted bits and know what those formats mean• Preservation of syntax and semantics
Unified Digital Format Registrya semantic registry for digital preservation
Unified Digital Format Registry
“A reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community”• “Unification” of the function and holdings of PRONOM
and GDFRhttp://www.nationalarchives.gov.uk/PRONOMhttp://gdfr.info/
• Open source platform / GPL• Semantic wiki• Funded by the Library of Congress
Unified Digital Format Registrya semantic registry for digital preservation
Representation information
What you need to know about something in order to exploit that thing meaningfully [OAIS/ISO 14720]
Information that lets you answer important preservation questions
• What format is it?• What are its significant properties?• Is it valid?• Is it at risk?• How can I render/play/read it?• What can it be transformed into?• And how?
Unified Digital Format Registrya semantic registry for digital preservation
Why semantic?
Everyone wants to say something about everything• The semantic web lets anyone say anything about
anything• Understandable to both people and machines
Unified Digital Format Registrya semantic registry for digital preservation
Data modelingAbstract
Base
Abstract Product
Abstract Format
File FormatCharacter Encoding
Compression Algorithm
MediaHardwareSoftware Document File
AgentIPR
specificationreference
file
holder
owner
creator
maintaineripr
Controlled Vocabulary …
HoldingProcess
embodies
product
input / output
dependency
Abstract Signature
External Signature
Internal Signature
signature
Digest
digest
Assessment Grammar
grammarassessment
holder
Unified Digital Format Registrya semantic registry for digital preservation
Provenance
“Trust, but verify”
• Complete change historyat the assertion level,including– Who made the assertion, and when?
– Confidence based on personal and institutional reputation
• Imprimatur by technically knowledgeable reviewers
Unified Digital Format Registrya semantic registry for digital preservation
Unified Digital Format Registrya semantic registry for digital preservation
Demo
Unified Digital Format Registrya semantic registry for digital preservation
Lessons learned
People with semantic experience are scarceToo much time evaluating/prototyping potential
technology choicesMore difficulty than anticipated integrating disparate
open source products0.x software is often numbered that for a reasonFeature lists aren’t (always)
Unified Digital Format Registrya semantic registry for digital preservation
Lessons learned
Availability of a worldwide selection of products is a good thing• Excellent support from AKWS/Universität Leipzig
Modeling differences• RDF (non-)standards
VM deployment• Disparate IT organizations supporting dev/prod instances
(except when you don’t read German)
Unified Digital Format Registrya semantic registry for digital preservation
Next steps
Long-term governance and operational supportTechnical maintenance and enhancementReplication/synchronizationBuilding contributor and reviewer communities
Unified Digital Format Registrya semantic registry for digital preservation