This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Introduction to Archivematica Midwest Archives Conference – May 7,2015 - Lexington, Kentucky Courtney C. Mumma, MAS/MLIS, US and International Community Development
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Introduction to Archivematica
Midwest Archives Conference – May 7,2015 - Lexington, KentuckyCourtney C. Mumma, MAS/MLIS, US and International Community Development
– normalization to sustainable formats on ingest + preservation of the original file
– include or add metadata, including PREMIS rights and restrictions
– storage agnostic
– bagged AIP with logs and metadata (METS.xml)
the AIP:so much bigger on the inside
value add to storage: metadata, logs, formats and structure to protect against software
obsolescence
the METS.xml file
<dmdSec> (descriptive metadata) Dublin Core XML<amdSec> (administrative metadata) <techMD> PREMIS: object <digiProvMD> PREMIS: events PREMIS: agents <rightsMD> PREMIS: rights<fileSec> (a list of the files and their roles and relationships)<structMap> (a representation of the physical structure of the AIP)
Let's get knee deep into computers
(we're going to log in now)
identify your test content
✔ What✔ Where
✔ How much
what types of digital content?
• born-digital
― government and university records, student artwork, e-theses and dissertations
― diverse formats: audiovisual, textual, geospatial, websites, presentations, images, databases
• digitized
― books, newspapers, images, video from vendors
― pre-made access and preservation copies
• submission documentation & metadata
― permission forms, accession records, pictures of digital media, etc.
― descriptive MD from other systems
where is your digital content?
• stored locally
• in other systems
― ie CONTENTdm, Dspace, DuraCloud, Islandora
• on detached media
― floppies, hard drives, cds, dvds, usb sticks, etc.
• packaged
― Bagged using Library of Congress BagIt specification
― Forensic images
― Zipped or tarballed
how much is there?
• Size: gigabytes, terabytes, petabytes
― Sum total of all material
― Size of distinct content sets
― Biggest single digital objects
• Quantity
― Sum total of all files
― Number of files in distinct content sets
• Resource capacity
― Space allocated to processing and storage locations
― Consider ideal transfer, SIP and AIP sizes
asking questions of your content
• descriptive metadata?
― needs preserved? already existent or need to add? complex or simple objects?
• submission documentation?
― donor agreements, pictures of physical media, licenses, etc
• access copies?
― already have them? what system to send/store?
• generate preservation copies?
― already have them?
• service masters?
asking questions of your content
• directory structure important (Original Order)?
• keep the package AND the content, or just one?
• rights information?
• is content Bagged? in DSpace? a forensic image? (Transfer type)
• how large should my archival packages be?
• will my archival packages have a 1:1 relationship with my transferred digital content? will my content be arranged into multiple packages or combined into one? (Arrangement workflow)
processing in Archivematica
• determine readiness by pilot testing content streams using the methods just described
• prepare content for transfer:
– put it in a folder in a transfer source directory
– prepare a metadata CSV for simple or complex objects
– prepare submission documentation
– identify pre-made access, preservation and/or service copies
– select the right workflow: standard, DSpace, forensic image and pre-configured settings (more on this soon)
now let's see it in action and discuss your own workflows!