Top Banner
Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca
33

Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Dec 27, 2015

Download

Documents

Austen Howard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Developing an Ingest Service for FedoraRyan ScherleMuzaffer Ozakca

Page 2: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

IUDL infrastructure project

• 2-year project funded by University Information Technology Services to reengineer digital library infrastructure around Fedora

• Builds on experience with Fedora in context of EVIA Digital Archive (ethnomusicology video)

• 2 full-time staff, plus part-time from many others

• Dozens of legacy collections with roughly 100,000 objects

• New collections: some content-focused, some research-focused

Page 3: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Diversity• Multiple media types• Multiple brands• Multiple tools

Page 4: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

The goal

Ingest

Aajk fs jkflsf jkds s jfs sdkf

Aajk fs jkflsf jkds s jfs sdkf

Aajk fs jkflsf jkds s jfs sdkf

Jkl id jid whi ahin inpa aialw hwiwl

Jkl id jid whi ahin inpa aialw hwiwl

Page 5: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Required features• Ingest common content types:

▫ Images▫Paged documents▫Textual documents

• Allow for easy creation of new content types• Must support several workflows

▫Metadata or media may be primary▫Most objects include derived media▫Systematic changes to metadata may be desired▫May need to connect with external tools for metadata

generation, validation, etc.▫A workflow engine may sit on top of the ingest system

Page 6: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Existing Ingest Tools

Page 7: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Criteria

•Ease of install•Native content models•Custom content models (e.g. paged)•Workflow neutrality, including object modification•Batch ingest

Remember, we’re evaluating object ingest only,not object delivery!

Page 8: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

But first, some disclaimers…

•This is not an objective evaluation, just our experiences

•We’re not experts in these systems•We’re evaluating ingest only, not delivery!•We’re evaluating ingest with a focus on our

needs•We believe in community

Page 9: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Fedora admin client

•Comes with Fedora•Geared towards admins rather than end users•No systematic way of entering data or attaching

files•Very flexible•The only way to create disseminators•Tedious

Page 10: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.
Page 11: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Fez

• End-to-End GUI system• Highly customizable content models, workflow, security• Customizable role and group based access control• Growing community• Originally developed as an Institutional Repository• Many preset content models• Can create “extension” metadata based on an XSD• External MySQL database for workflow/vocabulary data• GPL

Page 12: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Fez - ingest

• Single object ingest▫ Through Web UI▫ ImageMagick/JHOVE integration

• Bulk ingest: ▫ Upload files to a directory▫ Also can import existing Fedora objects

in bulks▫ Templates for metadata common to all

objects, manual updates for the rest▫ Batches possible, but only one file per object

• No disseminators• Custom metadata can be stored as a simple XML file• Objects must use “compound” content model

FedoraFedora

Page 13: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Fez – object organization

Page 14: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.
Page 15: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Elated overview

•End to end complete system for digital collections

•Simple customizable metadata and a simple workflow supported

•GPL

“Elated is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository System, and could be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs.”

Page 16: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Elated ingest• Single object ingest

▫ Through Web UI▫ Focused on DC metadata,

custom fields can be added

• Multi object ingest via zipped folders and files▫ Metadata template + manually▫ Batches possible, but only one file

per object

• Simple content model

• Manually-attached disseminators

FedoraFedora

Page 17: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Elated object organization

Page 18: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Valet for ETDs

•A component of the VTLS VITAL product focused on ETD submission

•Allows submission of thesis and a simple workflow for approval

•Part of a larger framework

•Highly focused on ETDs

Page 19: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

DirIngest overview

• Ingests objects from a structured ZIP file•Highly flexible•User must create METS structure by hand•Doesn’t handle disseminators•Can create some RELS-EXT data, but not fully

flexible•Cannot modify existing objects/collections

•Easy to use OhioLink Bulk Ingest

Page 20: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

DirIngest

Zip Archive

METS.xmlMETS.xml

FedoraFedora

Crules.xmlCrules.xml

Page 21: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Batch modify

•A method of controlling API-M with simple XML statements

•Can create “empty” objects and change them in systematic ways.

•Requires manual (or programmatic) creation of the modify scripts

•Can be used in conjunction with other tools…

Page 22: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Summary

Fez Elated Valet Dir Ingest

Batch Modify

Admin Client

Ease of install

Native CM

Custom CM

Workflow Neutrality

Batch ingest

Page 23: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Indiana Ingest Tool

Page 24: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Indiana Ingest Tool• A structured interface between a workflow management or repository management

GUI and the Fedora repository

• Focused on simple input formats for maximum flexibility

• Keeps the tools independent of the repository architecture

• Builds the FOXML, rather than requiring a full structure to be pre-built

• Binds disseminators

• Creates RELS-EXT relationships

• Can create and/or alter items in a collection

• Auto-generates technical metadata with JHOVE or XSLT.

Page 25: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Ingest Tool

Fedora

MODSEAD PDF

DatastreamsFOXML

Image Cataloging Tool Sheet Music Cataloging Tool

JPG SIP

Page 26: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Performing an ingest

• Place source metadata in an accessible location (filesystem, website)

• Place media files (both master and derivative) in an accessible location

• Define the "collection configuration"

• Run the ingest process

• Receive report

Page 27: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Sample collection config file<cc:collectionName>Hoagy Carmichael Correspondence</cc:collectionName>

<cc:contentModel>paged</cc:contentModel>

<cc:collectionID>hoagy</cc:collectionID>

<cc:collectionPid>iudl:6</cc:collectionPid>

<cc:existingItem>

<cc:fedoraItemExists action="alter"/>

</cc:existingItem>

<cc:masterContent type="image" subtype="tiff">

<cc:source location="localfs">{path to master images}</cc:source>

<cc:extension>.tif</cc:extension>

</cc:masterContent>

<cc:derivedContent derivativeType="images">

<cc:source location="localfs">{path to dreivative images here}</cc:source>

<cc:extension item="thumb">-thumb.jpg</cc:extension>

<cc:extension item="screen">-screen.jpg</cc:extension>

<cc:extension item="large">-full.jpg</cc:extension>

</cc:derivedContent>

<cc:descriptiveMetadata>

<cc:metadataItem type="ead" authoritative="true" level="collection">

<cc:source location="localfs">{path to ead}</cc:source>

</cc:metadataItem>

...

<cc:technicalMetadata>

<cc:metadataItem type="mix" authoritative="true" level="masterContent">

</cc:metadataItem>

...

Collection defn

File defn

Desc. metadata

Tech. metadata

What to doIf item exists

Page 28: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Ingest Tool

Ingest Tool

FedoraFedoraFOXML

Datastreams:

Images

METS

RELS-EXT

Example – Sheet Music

Page 29: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Ingest Tool

Ingest Tool

FedoraFedoraFOXML

Datastreams:

Images

METS

RELS-EXT

Example – preservation package

SIPSIP

Page 30: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Summary

Fez Elated Valet Dir Ingest

Batch Modify

Admin Client

Ease of install

Native CM

Custom CM

Workflow Neutrality

Batch ingest

IU Tool

Page 31: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Major difficulties in any ingest tool

•Providing flexibility in “style” of content model

•Matching filenames with metadata records

• Indicating the sequence of files in complex objects

•Abstracting over differing local metadata standards (even in our own collections)

Page 32: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Topics for future discussion

•What is the best structure for an ingest tool?▫Is our tool of interest to others?▫Would it be better to combine our capabilities with

an existing tool?

•Can we agree on some core content models?

Page 33: Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca.

Thank You!

• Infrastructure project wiki:▫http://wiki.dlib.indiana.edu/confluence/display/INF

•Contact info:▫Ryan Scherle [email protected]▫Muzaffer Ozakca [email protected]