2015 05-07-mac

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

Introduction to Archivematica

Midwest Archives Conference – May 7,2015 - Lexington, KentuckyCourtney C. Mumma, MAS/MLIS, US and International Community Development

http://creativecommons.org/licenses/by-nc/3.0/deed.en_US

lead developers of Archivematica, Access to Memory (AtoM) and Binder

archivists, librarians, technologists

core values

innovation and smart automation

leverage existing technology

transparency

interoperability and collaboration

grounded in archival practice

open source, including other projects

handshakes / integration

bounty model

hybrid public access and content management

manage accessions, taxonomies, multiple repositories, restrictions and rights, authority records

(ISAAR)

access derivatives including streaming video

multi-lingual description & ISAD(G), RAD, DACS, EAD export, MODS

link to preserved archival packages, sync metadata and PREMIS rights

Archivematica integration

A flexible open-source application

for standards-based digital preservation

Overall Workflow

describe and manage all hybrid content in AtoM

preserve digital content using Archivematica & hand off access copies and metadata to AtoM

provide access (digital copies and descriptions) and links to preserved content in AtoM

FOSS digital preservation (AGPLv3)

good practices and standards

no barrier to user groups, community or documentation

consistent, system independent Archival Information Packages (AIPs)

Bagit, Dublin Core, METS, PREMIS

system synthesis

active integrations:

– DSpace

– CONTENTdm

– Islandora/Fedora

– Archivists' Toolkit

– LOCKSS

– DuraCloud

– OpenStack

– TRIM

on-going integrations:

– ArchivesSpace● Bentley● RAC

– Dspace

– Hydra

– Arkivum

– BitCurator

– Dataverse

Archivematica makes OAIS (ISO 14721)

Archival Information Packages (AIPs)

– integrity & virus checks, format identification, characterization & metadata extraction, forensic activities, validation, arrangement, transcription, etc

– normalization to sustainable formats on ingest + preservation of the original file

– include or add metadata, including PREMIS rights and restrictions

– storage agnostic

– bagged AIP with logs and metadata (METS.xml)

the AIP:so much bigger on the inside

value add to storage: metadata, logs, formats and structure to protect against software

obsolescence

the METS.xml file

<dmdSec> (descriptive metadata) Dublin Core XML<amdSec> (administrative metadata) <techMD> PREMIS: object <digiProvMD> PREMIS: events PREMIS: agents <rightsMD> PREMIS: rights<fileSec> (a list of the files and their roles and relationships)<structMap> (a representation of the physical structure of the AIP)

Let's get knee deep into computers

(we're going to log in now)

identify your test content

✔ What✔ Where

✔ How much

what types of digital content?

• born-digital

― government and university records, student artwork, e-theses and dissertations

― diverse formats: audiovisual, textual, geospatial, websites, presentations, images, databases

• digitized

― books, newspapers, images, video from vendors

― pre-made access and preservation copies

• submission documentation & metadata

― permission forms, accession records, pictures of digital media, etc.

― descriptive MD from other systems

where is your digital content?

• stored locally

• in other systems

― ie CONTENTdm, Dspace, DuraCloud, Islandora

• on detached media

― floppies, hard drives, cds, dvds, usb sticks, etc.

• packaged

― Bagged using Library of Congress BagIt specification

― Forensic images

― Zipped or tarballed

how much is there?

• Size: gigabytes, terabytes, petabytes

― Sum total of all material

― Size of distinct content sets

― Biggest single digital objects

• Quantity

― Sum total of all files

― Number of files in distinct content sets

• Resource capacity

― Space allocated to processing and storage locations

― Consider ideal transfer, SIP and AIP sizes

asking questions of your content

• descriptive metadata?

― needs preserved? already existent or need to add? complex or simple objects?

• submission documentation?

― donor agreements, pictures of physical media, licenses, etc

• access copies?

― already have them? what system to send/store?

• generate preservation copies?

― already have them?

• service masters?

asking questions of your content

• directory structure important (Original Order)?

• keep the package AND the content, or just one?

• rights information?

• is content Bagged? in DSpace? a forensic image? (Transfer type)

• how large should my archival packages be?

• will my archival packages have a 1:1 relationship with my transferred digital content? will my content be arranged into multiple packages or combined into one? (Arrangement workflow)

processing in Archivematica

• determine readiness by pilot testing content streams using the methods just described

• prepare content for transfer:

– put it in a folder in a transfer source directory

– prepare a metadata CSV for simple or complex objects

– prepare submission documentation

– identify pre-made access, preservation and/or service copies

– select the right workflow: standard, DSpace, forensic image and pre-configured settings (more on this soon)

now let's see it in action and discuss your own workflows!

archivematica.org & accesstomemory.org

Questions??

Thank you!

2015 05-07-mac

Internet

access digital copies

premis rights

types of digital content

sync metadata

access copies

digital government

hybrid content

object premis