Top Banner
METS at UC Berkeley Generating METS Objects
21

METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Jan 02, 2016

Download

Documents

Erica Shelton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

METS at UC Berkeley

Generating METS Objects

Page 2: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Background

• Kinds of materials: – primarily imaged content & tei encoded

content• archival materials: manuscripts and pictorial

collections• oral histories

• Kinds of Metadata– Structural metadata: physical structure– Descriptive metadata – BasicTechnical metadata about digital files

and how they were produced

Page 3: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Tools For Producing METS Objects

• GenDB– Gathers structural, descriptive and

technical metadata

• GenX– Generates METS objects from

GenDB

Page 4: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

GenDB

• Consists of:– Relational database (Currently SQL Server)– Locally developed software for gathering

metadata and facilitating digital processing

Page 5: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Div 1

GenDB Database StructureStructural Metadata

Div 2

Div 3

Object 1

Object 2

(root)

(parent = div 1)

(parent = div 1)

Div 1

Div 2

Div 3

(root)

(parent = div 2)

(parent = div 1)

Div 4 (parent = div 2)

Object 1 Div 1 Div 2 Div 3

Object 2 Div 1 Div 2 Div 3 Div 4

Structural Md Table

Page 6: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Div 1

GenDB Database StructureDescriptive Metadata

Div 2

Div 3

Object 1

Object 2 Div 1

Div 2

Div 3

Div 4

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Core Desc Md

Name 1

Name 2

Name 3

Note 1

Note 2

Note 3

Name Table

Note Tables

Structural Md Table

Page 7: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Div 1

GenDB Database StructureContent File/Technical Md

Div 2

Div 3

Object 1

Master Image Table

Derivative Image Table

Structural Md Table

Drv 1

Drv 2

Drv 3

Mstr 1

Mstr 2

Technical Md

Technical Md

Drv 4

Technical Md

Technical Md

Technical Md

Technical Md

Page 8: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Populating the Database Tables

• Web interface: manual input of structural and descriptive metadata

• Digitization Management modules

– Generate work orders to guide digitization process

– Import content file information and technical metadata coming out of digitization process

• Batch loader: batch input based on TEI encodings, legacy metadata

Page 9: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Web Interface: WebGenDB

WebInterface

SQL ServerDatabase

Java Servlet

Java Server

XML Config Files

rmi

jdbc

Page 10: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Digitization Management Modules

WebInterface

Java ServletJava Server

SQL ServerDatabase

Imaging/TranscriptionWorkOrders

Vendor

Technical MDSpreadsheets

Page 11: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Batch Loader

WebInterface

SQL ServerDatabase

Java Servlet

Java Server

Java Batch Loader

XML Batch Load File

TEI Docs

XSLT

Page 12: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

WebGenDB

The concepts that drove the design• Shielding user from METS complexity• Highly configurable• Unicode support• Access driven by login privileges• Use of Open Source software and

components• Distributed approach

Page 13: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

XML Configuration Files

• Three levels– Common to all projects elements

– Common to all screens in a project elements

– Specific to a screen in a project

• Define fields common to all projects• Define fields used in specific project• Define screens by project & object type

Page 14: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

AlProjects.xml

Proj1.xml

Proj2.xml

ObjectType1.xml

ObjectType2.xml

ObjectType1.xml

ObjectType2.xml

Relation among XML files

Page 15: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

<ObjectType> <name>workorder</name> <fileLocation> /data/_w/GenDB/WEB-INF/classes/edu/berkeley/library/propertyFiles/CalCultureWorkOrderScreensFile.xml</fileLocation> </ObjectType>

<Field> <name>Image</name><type>checkbox</type><label>Image </label><size>1</size> </Field>

<Field> <name>Text</name><type>checkbox</type><label>Text </label><size>1</size> </Field>

<Field> <name>Title</name><type>text</type><label>Title </label><size>60</size> </Field>

Project XML file example

Page 16: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Software used

• MSSQL running on NT• Tomcat 4.1.2 implementing servlets 2.3• Jsdk 1.4• Xalan 2.4• Xerces 1.0.3• FOP 0.12.1• JDOM beta 8• Opta 2000

Page 17: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Relationship of GenDB to METS

• Metadata not directly stored in METS, MODS or MIX schema formats.– Much of the database structure was developed

before these standards emerged– Database structure and content adjusted to be

compatible with all these formats

Page 18: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

GenX: From GenDB to METS

• Allows Digital Publishing Group staff to select the objects in the GenDB database that are ready for export and to export them as METS objects.

Page 19: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

GenX Architecture

AppInterface

GenDB

Java Application METS XML Repository

JDBC

Page 20: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

GenX Output

• METS output corresponding to version 1.3

• Descriptive metadata exported to METS descMD in MODS 2.0 format

• Technical Metadata exported to METS techMD in MIX format

• Planned:– Text technical md to METS descMD in

NYU TextMD– Rights to METS rightsMD in ODRL subset

Page 21: METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.

Links

• GenDB Web Interface Demo– http://sunsite2.berkeley.edu/GenD– login: demo– password: demo

• Developers:– [email protected][email protected][email protected]