DRS 2 Metadata Migration June 25, 2013
Dec 15, 2015
DRS 2 Metadata Migration
June 25, 2013
Agenda
• Introduction• Preliminary results - content analysis• Metadata options• Next steps• Questions
INTRODUCTION
Reason for metadata migration
• Different data model– File -> Object (a coherent set of content that is
considered a single intellectual unit for purposes of description, use and/or management: for example a particular book, web harvest, serial or photograph.)
• Different metadata schemas– Many locally-defined -> community-standard
• Different packaging of metadata– Use of METS in some cases -> consistent use of
METS
Path to metadata migration
Analysis • Metadata• Content• Users
Prototype• Proof-of-
concept• Time
estimates
Migration plan• Sequence• Schedule
Develop tools• Dashboard• Object
builders
Metadata migrationWe are here
Key feedback points
Analysis • Metadata• Content• Users
Prototype• Proof-of-
concept• Time
estimates
Migration plan• Sequence• Schedule
Develop tools• Dashboard• Object
builders
Metadata migrationTechnical
options
Process options
Timing
Analysis • Metadata• Content• Users
Prototype• Proof-of-
concept• Time
estimates
Migration plan• Sequence• Schedule
Develop tools• Dashboard• Object
builders
Metadata migration
Next 3 months
What does it involve?
• Aggregate DRS1 files into objects– Different object types = content models
• Generate an object descriptor per object
Document example
PDF file
Document example
PDF file
New object (content model = DOCUMENT)
Document example
PDF file
Descriptor file
New object (content model = DOCUMENT)
Still image example
Archival master
image file
Still image example
Archival master
image file
Productionmaster
image file
Still image example
Archival master
image file
Deliverableimage file
Productionmaster
image file
Still image example
Archival master
image file
New object (content model = STILL IMAGE)
Deliverableimage file
Productionmaster
image file
Still image example
Archival master
image file
Descriptor file
Deliverableimage file
Productionmaster
image file
New object (content model = STILL IMAGE)
Aggregate DRS1 files into objects
• One content file per object– Color profile– Document– Google document container 1– Google document container 2– Google document container 3– Opaque container– Text
Aggregate DRS1 files into objects
• Multiple content files per object– Audio– Web harvest– Biomedical image– PDS document– Target image– MOA2– Still image
Generate object descriptors
• METS format– Embedded schemas (PREMIS, MODS, MIX, etc.)
• Metadata sources– DRS1 database– DRS1 METS files where they exist– Examining the content files– Catalog records?
PRELIMINARY RESULTS:CONTENT ANALYSIS
Preliminary content analysis
• Conceptually “built” objects for 13/14 content models (~36 million / 44 million files)– All but still image– Order helps!
Still Image
MOA2
Biomedical Image
PDS Document
Preliminary content analysis
• 1,091,670 objects from 36,190,120 files– ~33 files per object
• Relatively few surprises but content analysis is not complete
Content cleanup
• MOA2 files (8,024)• Index maps (2,686)• Entity files (1)• Merged PDS descriptors (22,203)
Content cleanup
• Orphaned target image (5), target description files (4)
• Orphaned audio files (71)
METADATA OPTIONS
O
DRS1 DRS2
e.g., billingCodeownerCodeaccessFlag
tech metadataowner-suppliedName
rolepurposequality
usageClass
e.g., accessFlagtech metadata
owner-suppliedNamerole
processingquality
usageClass
e.g., billingCodeownerCode
owner-suppliedName
FILE INFO
FILE INFO
OBJECT INFO
DESCRIPTOR
O
DRS1 DRS2
e.g., billingCodeownerCodeaccessFlag
tech metadataowner-suppliedName
rolepurposequality
usageClass
e.g., accessFlagtech metadata
owner-suppliedNamerole
processingquality
usageClass
e.g., billingCodeownerCode
owner-suppliedName
FILE INFO
FILE INFO
OBJECT INFO
DESCRIPTOR
O
DRS1 DRS2
e.g., billingCodeownerCodeaccessFlag
tech metadataowner-suppliedName
rolepurposequality
usageClass
accessFlagtech metadata
owner-suppliedNamerole
processingquality
usageClass
billingCodeownerCode
owner-suppliedNamecaption unit name
view text
FILE INFO
FILE INFO
OBJECT INFO
DESCRIPTOR
METS
Object LabelMODSPDS info, etc.
Object LabelObject-level MODS
Objects
• Owner supplied name is required• Need to generate during migration• Four cases
– A METS file exists– New object will be built from a single content file– New object will be built from multiple content files– No OSN (potential case)
• Proposal for most cases: – add prefix or suffix to METS or content file owner supplied
name
Objects
• Other required object elements– insertionDate• date of earliest file?
– captionBehavior• for existing objects, set based on billing code• prospectively, set by depositor
– viewText• available for all objects, not just PDS• default to off
Objects
• Descriptive metadata– Take MODS from existing METS as is or import
new• From Aleph• From Finding Aid
– If re-imported, update METS label or not?– Import from OLIVIA based on owner supplied
name for the file?
Objects from existing METS
• Identifiers for Harvard metadata – Identify finding aid identifiers– Convert “Old HOLLIS” numbers– Aleph IDs: include check digit or not?– Convert to URIs or actionable URNs from plain IDs• Could DRS format such URIs for new DRS2 input?
Objects from existing METS
• PDS elements– PDF owner text becomes caption unit name– viewOcr function becomes viewText– goto function will be automatically determined by
presence of structMap/div attributes• Caption behavior – for existing objects, set by billing code
Files
• Run automated processes to identify, validate and characterize file technical characteristics
• Extract technical metadata
Files
• isFirstGenerationinDrs – Values: yes, no, unspecified– Should we supply “yes” for archival masters
and/or top of derivation chain?
Image Files
• Converting from local scheme to MIX• Local field questions– Methodology– History– Source– Enhancements
Text files
• Converting from local scheme to textMD• Descriptor_type will be absorbed into
different places in DRS2
• Extracted metadata can supply• markup_basis • markup_language for specific schemas• possibly other elements
Audio files
• Moving from local schema to AES57-2011: Audio object structures for preservation and restoration
Versioned metadata
• History will be tracked for key administrative elements:– Access flag– Admin flag (new)– Billing code– Owner code
• What values to assign for required creation date and agent for migrated content?
NEXT STEPS
Next steps
• Continue analysis and development of technical requirements
• Build prototype• September check-in on progress• Create metadata migration plan• Open meeting to review plan
OPEN FOR QUESTIONS