Digital Libraries Made Easy 2004 SAMLA Convention Roanoke, Virginia November 12, 2004 Edward A. Fox Digital Library Research Laboratory & Dept. of Computer Science, Virginia Tech, Blacksburg, VA 24061 [email protected]http://fox.cs.vt.edu http://fox.cs.vt.edu/talks/2004/
203
Embed
Digital Libraries Made Easy 2004 SAMLA Convention Roanoke, Virginia November 12, 2004 Edward A. Fox Digital Library Research Laboratory & Dept. of Computer.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to economic and technological growth, education
• DL - a domain for international collaboration• wherein all can contribute and benefit• which leverages investment in networking• which provides useful content on Internet & WWW• which will tie nations and peoples together more
strongly and through deeper understanding
Libraries of the FutureJCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
5S Definition: Digital Libraries are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
SynchronousScholarly Communication
Same time, Same or different place
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)
CS -> CSTC -> CRIM• NSF and ACM Education Committee are funding
a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/
• College of NJ, U. Ill. Springfield, Virginia Tech
• Focus initially on labs, visualization, multimedia
• Multimedia part is also supported by a 2nd grant to Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules that have been reviewed and tested.
• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
• ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org
Browsing (1)
Browsing (2)
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
Digital library architecture for localand interoperable CITIDEL services
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
Cluster Search Results from CITIDEL
Cluster NDLTD-Computing
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
Supports:
Users
Content
Tools
(profiles)
(metadata)
(protocols)
Learning communities
Customizable collections
Application services
Enables:Environments for
• Communication
• Collaboration
• Creation
• Validation
• Evaluation
• Recognition
• ...
• Discovery
• Stability
• Reliability
• Reusability
• Interoperability
• Customizability
• ...
of Resources
AND
NSDL ProgramTracks
• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources
• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty
• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks
• Pathways: large efforts across broad ranges of areas or approaches or users
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
• Recall the 5S approach• Minimal digital library• Metamodel for minimal digital library• Metamodel for “born digital standard” DLs• Metamodel for architectural DL
• Here, focus on key concepts in minimal DL
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object• Is the DO version the same, better, or worse?• Decision for ETDs: structured + rendered
• Surrogate for “real” object• Not covered explicitly in metamodel for a
minimal DL• Crucial in metamodel for archaelogy DL
Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
• Internet middleware• Independent system / part of federation• Decompositions vary
• search engine, browser, DBMS, MM support• repository, handle server, client• information resources + mediators, bus or agent
collection + client with workspace/environment• Metrics: e.g., for federated search
Clusters
• How can computer clusters scale with collections and user communities to achieve cost-effective solutions for DLs?
• Paul Mather dissertation by early 2005• Modeling and simulation• Cluster size• Communication fabric and patterns• Disks and nodes• Characterize DL collections: file sizes• Characterize user workload: logs• Special considerations:
• Linear hashing of names• Replication of popular objects
Also called: digital rep., digital asset rep., institutional repository
Stores and maintains digital objects (assets)Provides external interface for Digital Objects
Creation, Modification, Access
Enforces access policiesProvides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
Goals of Institutional Repositories (by Steven Harnad, U. Southampton)
Self Archiving of Institutional ResearchSelf Archiving of Institutional ResearchThesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps
Management of digital collectionsManagement of digital collections
Preservation of materials – decentralized approachPreservation of materials – decentralized approach
Housing of teaching materialsHousing of teaching materials
Electronic Publishing of journals, books, posters, maps, audio, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsvideo and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
Fedora™ Digital Object ArchitecturePersistent ID (PID)
Disseminators
System Metadata
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Datastreams
Images, E-books, E-journals, Music, Video, etc.
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
users digital objects
?
?1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video?digital library
Monolithicand/or
Custom-builtweb-basedapplication
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
componentized digital library
?
?
?
?
???
?
?
?
?
??
? ?
?
?
?
?
?
?
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
Open Digital Library Protocol
Extended OAI-PMH
Protocol for Metadata Harvesting
Open Digital Library Component
Extended OPEN ARCHIVE
OPENARCHIVE
Open Digital Library Deployments
• NDLTD (www.ndltd.org)• Computer Science Teaching Center (www.cstc.org)• Computing and Information Technology
Interactive Digital Educational Library (www.citidel.org)
• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet
• OCKHAM• Open to others through DL-in-a-box
Open Digital Library
• Network of Extended Open Archives where each node acts as either a provider of data, services or both.
• Component = Node
• Protocol = Arc
Open Digital Library Components
• Running now• XML-File (data provider from file system)• Search: simple or in-memory (Essex) or generalized• Union, browse, recent, filter• E-journal/review, Submit, Edit, Annotation• Recommender, Rating; Mirroring (see JCDL’02)• Working with NCSA: from DB, unstructured text
• Others in process• Classification/categorization• Registry (and other connections with web services)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
ETD DL for the Networked Digital Library of Theses and Dissertations
(www.ndltd.org)
Search
Filter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers ETD collections
Example Open Digital Library
OAI, ODL, DL-in-a-box
• Open Archives Initiative• since 1999, www.openarchives.org
• Open Digital Libraries• since 2001, from www.dlib.vt.edu• with Hussein Suleman (now U. Cape Town)
• DL-in-a-box• NSDL support since 2001• Aimed to help new collections / services projects• http://dlbox.nudl.org
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Commercial DL Examples
• IBM Digital Library
• Virtua (www.vtls.com)• Fedora -> VITAL
• Some systems from NSF DLI projects• Google
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Conceptual Category Feature Name
Discovery Tools
Searching
Browsing
Syndication & Notification
Aggregation Tools
Personal Collections
Content Aggregator and Packaging Tool
Community & Evaluation
Evaluation System
Context Usage Illustrators
Wish Lists
WCET
LOR
Study
2004
Outline1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions• Open Source: Greenstone, eprints, Kepler, DSpace, Fedora,
ETD-DB, ODL
• Commercial: IBM Content Manager, VTLS’ VITAL
• Comparison: by capability - institutional repository, by environment (library, WWW, personal use)
• Evaluation, usability
5. Advanced Issues
Case Study: NCSTRL Costs/BenefitsStakeholders Sample Potential Cost Sample Potential Benefit
Providers Faculty Lower value for P&T Faster publishing
Students Less recognition Broader set of outlets
Practitioners Limited relevance Ease of publishing, > quantity
Users Faculty Lower quality of work Broader access to resources
Students Higher access costs (vs. department available material)
Lower access costs (vs. journal available material)
Departments New maintenance costs Broader visibility
University libraries Additional access costs Access to new resources
Practitioners More difficult access Access to new resources
Outline
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
• Challenges, open problems
• Promising approaches
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:
• DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, ...
• Need tools & methods to make them easier to build
Outline
1. Introduction
2. Historical Perspective
3. Topical Perspective
4. Software Solutions
5. Advanced Issues
• Challenges, open problems
• Promising approaches
NDLTD: How can a university get involved?
• Select planning/implementation team• Graduate School
• Library
• Computing / Information Technology
• Institutional Research / Educ. Tech.
• Join online, give us contact names• www.ndltd.org/join
• Adapt Virginia Tech or other proven approach• Build interest and consensus
• Start trial / allow optional submission
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
ETD Union Collection (OAI)
VIRTUA
Merged Metadata Collection
ODL (VT)
Virginia Tech ETD Archive
Brazil ETD
Archive
OCLC ETD
Archive
Future: recommender, …
… OAI Data Provider
OAI Service Provider
OAI Harvesting
LEGEND
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
OCLC SRU Interface
Union catalog: VTLS, VT
• VTLS will enhance search/browse service for ETDs
• Will harvest from OCLC’s set of ETD records
• Will receive through other mechanisms
• Will work with MARC-21 and ETD-MS
• VT will continue to offer experimental services
ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)
VTLS Union CatalogContent Languages
The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish
Examples follow
Language = German; hits = 137
Full record display
Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
Why ETD?Short Answer
• For Students:• Gain knowledge and skills for the Information Age
• Richer communication (digital information, multimedia, …)
• For Universities: • Easy way to enter the digital library field and benefit thereby
• For the World: • Global digital library – large, useful, many services
• General:• Save time and money
• Increased visibility for all associated with research results
ETANA-DL: 5S Extension
• 5S and component architecture to allow handling of very complex DL applications: archaeology
• Information visualization, clustering
• Mappings across streams, structure, spaces
Case Study (Archaeology):ETANA
• NSF ITR with CWRU (and Vanderbilt …)
• Faster DL development• for complex application domains,• with suitable tailoring
• Approach• ODL – pool of components• 5S – theory-based generation of systems
ETANA Website
Lahav Website
Megiddo Opening Screen
Locus Screen: Pictures
View all
Area Screen: Distribution of Artifacts
ETANA-DL Website
Archaeology DL – Approach
• Solve the following DL problems:• interoperability,• making primary data available,• data preservation
• Modeling archaeological information systems• using 5S theory to design system and services
• Rapidly prototyping DLs that handle• heterogeneous archaeological data using• componentized frameworks