Top Banner
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected] https://github.com/organizations/Georgetown-Universit y-Libraries
62

Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected] .

Mar 27, 2015

Download

Documents

Jada Whalen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Focus on Your Content, Not on Ingesting Your

ContentTerry Brady

Applications Programmer AnalystGeorgetown University Library

[email protected]

https://github.com/organizations/Georgetown-University-Libraries

Page 2: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Goals of our Repository Managers

Create new collections

Grow collections

Accurately describe collection contents

Showcase our repository content

Page 3: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our storyUsing simple tools to facilitate these goals

Page 4: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Imagine that you have content to load into your

repository

Page 5: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Scenario: One Item to Add to DSpace

Page 6: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

One Item to Add: Item Submission

Click through 7 item submission screens

authoring metadata as you go

Page 7: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Scenario: Three Items to Add to DSpace

Page 8: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Three Items to Add: Item Submission

Click through 3x7 item submission

screens authoring metadata as you go

Page 9: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

50 Items

Scenario: 50 newspaper issues to add to DSpace (very similar metadata)

Page 10: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

50 Items to Add: Individual Item Submission is impractical

Page 11: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Next OptionDSpace Bulk Ingest Process

Page 12: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

DSpace Bulk Ingest

50 Items

Page 13: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest Folder

Media File

Thumbnail (optional)

Contents File

Metadata File

License File (optional)

Page 14: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Build a Metadata Spreadsheet

50 Items

Page 15: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Build Ingest Folders

50 Items

Page 16: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemCopy Item to Folder

50 Items

.PDF

Page 17: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a unique Contents File

50 Items .TXT

.PDF

Page 18: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items

.PDF

.TXT

.XML

Page 19: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: Initiate Import from a Terminal Window

50 Items .TXT

.PDF

.XML

Page 20: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items .TXT

.PDF

.XML

What if you make a mistake?

What if you need to refine the metadata?

Page 21: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

The ChallengeWant to grow the collections

But, the ingest process is daunting

Page 22: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

The conversation focused on HOW to ingest the contentRather than on the content itself

Page 23: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach

Page 24: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach:Empower Content Owners

• Automate the tedious tasks

• Make metadata entry the focus of the effort

• Hide the command line from content owners

Page 25: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Approach:Simple Tools

Work around the tedious steps

Without constructing a complex workflow

Page 26: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Our Tools

• File Analyzer

o Desktop Application for File System Traversal

• DSpace QC Tools

o Web application for Batch Process Submission

Both of these tools are available on GitHub

• Georgetown-University-Libraries

Page 27: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

File AnalyzerDesktop Application for File Processing

Page 28: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 29: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

What we need

50 Items

Page 30: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 1: Automatically Generate an Ingest Inventory based on existing files

50 Items

Page 31: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 32: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Export the Generated Inventory

Page 33: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 2: Edit the Ingest Inventory as a Spreadsheet

Page 34: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 3: Generate the Ingest Folders from the Inventory Spreadsheet

Generate Contents FileGenerate Dublin Core Metadata FileInclude custom thumbnails if applicable

Page 35: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 36: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Create Ingest Folders

• An error message will appear if files are missing (or misspelled)

• Process can be rerun if the metadata spreadsheet needs to change

Page 37: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest Folder Creation Report

Page 38: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 4: Validate Ingest Folders

• Identify Missing Files• Required Metadata• Validate Files

o Contentso Dublin Core

Page 39: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 40: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Validation Status Report

Page 41: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Step 5: Move Ingest Folders to Server and Initiate Bulk Ingest

Page 42: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

for Batch Process Submission

Web Tools

Page 43: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 44: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Web Tools, Tutorials co-located with tools

Page 45: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Collection

Folder Location

Page 46: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Processes run by Bulk Ingest

• import

• filter-media [collection]

• update-discovery-index

• oai-import

• stats-util

Content is visible, searchable, and thumbnails are present!

Page 47: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 48: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Results

Empowered Librarians

Iterative metadata refinement

At the right point of the workflow

Significant growth in repository content

Decreasing IT involvement

Rapid development of support tools

Page 49: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Derived Tools

Generate Ingest Folders for ProQuest ETD's

Filter Media

Page 50: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Ingest ETD's from ProQuest

Page 51: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

ProQuest ETD Ingest Rule

Page 52: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Filter Media Toolfor Items Submitted One by One

Collection

Filter Media Tasks

Re-index?

Page 53: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Benefits

Companion tools easy to learn

Users are very comfortable with them

De-mystify DSpace-specifics

Users trained other users!

Page 54: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Other Tools Created

Automation

• Undo Bulk Ingest

• Update Metadata

• Move Community/Collection

Reporting

• Data Quality Reports

• Statistics Reports

Page 55: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

More Tools (time permitting)

Page 56: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Data Quality Reports

• Items with multiple media files

• Non-PDF Document Items

• Items missing a Thumbnail

• "Non-standard" Media Types

• Items modified last 30 days

• Items with Embargo

• Items missing a metadata field

• Item metadata containing a URL

Page 57: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Collection QC Report

Page 58: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Item QC Report

Page 59: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Usage Statistics Reports

• Not confident in the out of the box reports

• Wanted to understand underlying data

• Filter Stats

o On campus

o Within the library

Page 60: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .
Page 61: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Try it yourself

GitHub: Georgetown-University-Libraries

• File Analyzer & Metadata Harvestero Just need a Java Compilero Contains several utilities for digitization workflowso Links to tutorials

• DSpace QC Toolso PHP Codeo Sample code, not ready to runo Links to tutorials

Please let me know how these work for you!

Page 62: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu .

Terry BradyApplications Programmer Analyst

Georgetown University [email protected]

https://github.com/organizations/Georgetown-University-Libraries