Top Banner
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University
32

Dienst Distributed Networked Publishing

Jan 15, 2016

Download

Documents

dinah

Dienst Distributed Networked Publishing. Carl Lagoze Digital Library Scientist Cornell University. Cornell Digital Library Research Group (CDLRG). Research and Development of Component-Ware Digital Library Infrastructure - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dienst Distributed Networked Publishing

DienstDistributed Networked Publishing

Carl LagozeDigital Library Scientist

Cornell University

Page 2: Dienst Distributed Networked Publishing

2

Cornell Digital Library Research Group (CDLRG)

• Research and Development of Component-Ware Digital Library Infrastructure

• Developed out of DARPA-funded Computer Science Technical Reports Projects (CS-TR)

Page 3: Dienst Distributed Networked Publishing

3

Component-Ware Digital Libraries

• Service-based infrastructure– Interface (protocol) of each service– Interactions between services– aggregations into logical collections and libraries

• Layered approach accommodates requirements of varying clientele– research libraries - high-integrity, quality of

service, security– informal collections - e.g., web

Page 4: Dienst Distributed Networked Publishing

4

CDLRG Research Projects

• FEDORA

• Distributed Searching and Resource Discovery

• Digital Library Collection Definition

• Metadata (Dublin Core and Warwick Framework)

• Networked Computer Science Technical Reports Project (www.ncstrl.org)

Page 5: Dienst Distributed Networked Publishing

5

What is NCSTRL?

A Vehicle and Testbed for Digital Library Interoperability

A Vehicle for Exploring Policy and Organization

A Production Digital Collection

Page 6: Dienst Distributed Networked Publishing

6

• A growing collection of CS research reports

• A service relied on by users and publishers

• Motivates solving hard, real-world problems: IPR, quality of service, federation of publishers

A Production Digital Collection

Page 7: Dienst Distributed Networked Publishing

7

• Create a modular system based on a standard open architecture

• Provide a testbed for demonstrating and testing new digital library components

• Work with variety of researchers: DLI, ERCIM, Los Alamos

A Testbed for Technology

Page 8: Dienst Distributed Networked Publishing

8

A Vehicle for Exploring Policy and Organization

• Creating a self-sustaining international federated digital collection

• Extending the domain and scope while maintaining a coherent collection

• Policy issues: charging, IPR, liability, technical quality, relationshipto other DL organizations

Page 9: Dienst Distributed Networked Publishing

9

Origins of NCSTRL

• DARPA-funded CS-TR Project– CNRI, Berkeley, CMU, Cornell, MIT,

Stanford

• NSF-funded WATERS Project– Old Dominion, SUNY Buffalo, Virginia,

Virginia Tech

• Other CS Tech Reports Efforts– Harvest, UCSTRI, NZDL

Page 10: Dienst Distributed Networked Publishing

10

NCSTRL Project Participants

• NCSTRL Steering Committee

• NCSTRL Working Group

• Cornell Digital Library Research Group

• The Collection

Page 11: Dienst Distributed Networked Publishing

11

NCSTRL Steering Committee

• Responsible for policy direction, oversight

• How to broaden interoperability efforts into broader community

Page 12: Dienst Distributed Networked Publishing

12

NCSTRL Working Group

• Responsible for operational oversight of the current system

• Membership from CSTR and WATERS projects

Page 13: Dienst Distributed Networked Publishing

13

Cornell Digital Library Research Group

• Responsible for day-to-day support and maintenance of existing system

• Clearing house for technical collaborations

• Evolution and Research Directions

Page 14: Dienst Distributed Networked Publishing

14

Contributing Institutions

105 Institutions in US, Europe, and Asia

Page 15: Dienst Distributed Networked Publishing

15

Dienst

• is a protocol and reference implementation of a distributed digital library service

• where a network of services provide

• World Wide Web browser access,

• uniform search over distributed indexes,

• and multi-formatted documents.

Page 16: Dienst Distributed Networked Publishing

16

Dienst document model

decompositions

physical logical

representations

AS

CII

TIF

F

Pos

tScr

ipt

met

adat

a

Document Handle (URN)

Page 17: Dienst Distributed Networked Publishing

17

Exposing the Model through the Protocol

• Documents addressable through their URNs

• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format

Page 18: Dienst Distributed Networked Publishing

18

Dienst Services

send search request

WWWbrowser

Dienst UserInterface

Repository

IndexIndex Index

Repository Repository

send document requestreceive MIME-typed document

send document requestreceive MIME-typed document

send site specific search requestreceive hit list

receive unified hit list

Page 19: Dienst Distributed Networked Publishing

19

Exposing the Services through the Protocol

• All protocol requests are service specific,

• so the functionality of any service can be accessed by another service or a new service.

Page 20: Dienst Distributed Networked Publishing

20

Gateways to non-Conforming Sites

FTP/HTTP “Repositories”

Standard Servers

User Interface Gateway Server

Page 21: Dienst Distributed Networked Publishing

21

Use by External Services

User Interface

Search Engine(Z39.50)

Page 22: Dienst Distributed Networked Publishing

22

Publishing Using DienstRetrospective Conversion

• Scanning of legacy documents– Cornell– MIT– Stanford

• Conversion to common formats– gifs– thumbnails– PostScript

Page 23: Dienst Distributed Networked Publishing

23

Publishing with DienstDigital Originals

• PostScript as lingua franca– “thanks Microsoft”

• Form submission– author-generated descriptive metadata

• Clerical clearing-house

• Automatic format conversion

Page 24: Dienst Distributed Networked Publishing

24

Collection Definition in Digital Libraries

• Multiple levels of selection– authors “publish”– repositories have submission policies– search engines index– objects in search engines aggregated into collections– user interface gateways provide access to multiple

collections

• What is “in” a digital library is defined by what can be found using its resource discovery tools

Page 25: Dienst Distributed Networked Publishing

25

Defining the Collection -Collection Service

Page 26: Dienst Distributed Networked Publishing

26

Regional Structure

central collectionserver

Page 27: Dienst Distributed Networked Publishing

27

Connectivity Regions and Collection Views

Page 28: Dienst Distributed Networked Publishing

28

Improvements to the Protocol - Dienst 5

• Incremental enhancement to existing interoperability framework

• Improved document model– versions– hierarchical part specification– binders (multi-part documents)

• Implementation currently under development

Page 29: Dienst Distributed Networked Publishing

29

Dienst 5 Document Structure

• Structure Request– Reveal, in XML, full or collapsed structure

of a document• e.g., chapters, sections, figures, etc.

– Describe multiple views of a document• e.g., bibliography, content, thumbnails

Page 30: Dienst Distributed Networked Publishing

30

Dienst 5 Document Dissemination

• Disseminate Request– Access to component(s) described by

Structure– e.g., disseminate chapter 2 page 5 in

PostScript

Page 31: Dienst Distributed Networked Publishing

31

Supporting Multiple Collections

• NCSTRL is currently a single collection• Other users of Dienst protocol

– European gray literature, thesis, and dissertation collections

– NASA space science– Mediterranean environment data and software– Los Alamos Pre-prints

• Expanding the technology to multiple collections through regions

Page 32: Dienst Distributed Networked Publishing

32

Lessons Learned and Work to be Done

• Intellectual property• Quality

– quality of collection (reviewing)– quality of metadata– quality of service

• Resisting information entropy• Richer “documents”• Archiving and Preservation