Top Banner
Building a large- scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003
41

Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Building a large-scale digital library for education

Carl LagozeCommon Solutions GroupJanuary 16, 2003

Page 2: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

What is the NSDL?

A library of exemplary collections and services with practical educational value

A center of innovation in digital libraries applied to education

A community center, focused on digital-library-enabled science education

A network of NSDL-funded projects

Page 3: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

browsing

search

ingannotating

curriculum building

filtering

quality ra

ting

Building service, collaboration, and knowledge layers over a variety of resources for a variety of users

Open Access Web

Open Access Web

PublishersPublishers

NSF-funded Collections

NSF-funded Collections

Page 4: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

1996 Vision articulated by NSF's Division of Undergraduate Education

1997 National Research Council workshop

1998 Preliminary grants through Digital Libraries Initiative 2

1998 SMETE-Lib workshop

1999 NSDL Solicitation

2000 6 Core Integration demonstration projects + 23 others funded

2001 1 large Core Integration System project funded

2002 More than 80 independent projects funded

2003 Core Integration funding fixed until 2006

Short History of the NSDL

Page 5: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

NSF Grant Structurehttp://www.nsf.gov/pubs/2002/nsf02054/nsf02054.html

Collections Develop and maintain content

Services For users, collection providers, core

integration

Targeted research Core Integration

Organizational, economic, technical $US5M of total $US25M total budget

Page 6: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

A collaborative projectUniversity Corporation for Atmospheric Research - Dave FulkerCornell University - William ArmsColumbia University - Kate Wittenberg

With additional partnersEastern Michigan UniversitySyracuse UniversityU Mass-AmherstUC-Santa BarbaraSan Diego Supercomputer Center

Director of Technology - Carl Lagoze

NSDL CI Technical Organization

Page 7: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

It is possible to build a very large digital library with a small staff.

But ...

Every aspect of the library must be planned with scalability in mind.

Some compromises will be made.

Automation is key.

Core Integration Philosophy

Page 8: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Perspective on the Budget

Page 9: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Resources for Core Integration

Core Integration

Budget $4-6 million

Staff 25 - 30

Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

Page 10: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Aggregation rather than collection Core integration team will not manage any collections

Spectrum of interoperability Accommodate diversity of participation models Open interfaces and standards permitting plug in of

array of value-added services One library many portals

Accommodate multiple quality and selection metrics Tailor presentation of content and nature of services

to audience needs Open toolkit of software and services for

library building

NSDL technical mantras

Page 11: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Level Agreements Example

Federation Strict use of standards AACR, MARC(syntax, semantic, Z 39.50and business)

Harvesting Digital libraries expose Open Archivesmetadata; simple metadata harvesting

protocol and registry

Gathering Digital libraries do not Web crawlerscooperate; services must and search enginesseek out information

Spectrum of interoperability

Page 12: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

This is a big task that no one has done before! Work on the priorities

Focus on one point on spectrum of interoperability Metadata harvesting Incorporate NSF funded collections and selected other

collections Leverage existing (or at least emerging) technologies and

protocols OAI, uPortal, Shibboleth, SDLIP, InQuery

Provide reliable base level services Search and Discovery, Access Management, User Profiles,

Exemplary Portals, Persistence Plant some seeds for the future

Machine-assisted metadata generation Automated collection aggregation Web gathering strategies

Translating to first release goals

Page 13: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Central storage of all metadata about all resources in the NSDL Defines the extent of NSDL collection Metadata includes collections, items, annotations, etc.

MR main functions Aggregation Normalization redistribution

Ingest of metadata by various means Harvesting, manual, automatic, cross-walking

Open access to MR contents for service builders via OAI-PMH

Metadata Repository

Page 14: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Metadata Strategy

Collect and redistribute any native (XML) metadata format

Provide crosswalks to Dublin Core from eight standard formats Dublin Core, DC-GEM, LTSC (IMS), ADL

(SCORM), MARC, FGCD, EAD

Concentrate on collection-level metadata Use automatic generation to augment

item-level metadata

Page 15: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Importing metadata into the MR

Collections

Harvest

Staging area

Cleanup and

crosswalks

Database load

Metadata Repository

Page 16: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Exporting metadata from the MR

NSDL services

Create OAI server tables

Metadata Repository

SQL queries OAI server Harvest NSDL services

Create OAI server tables

Metadata Repository

SQL queries OAI server Harvest

Page 17: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Metadata Triage

Page 18: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

What to Index?

When possible, full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing).

Comprehensive metadata is an alternative, but available for very few of the materials.

What Architecture to Use?

Few collections support an established search protocol (e.g., Z39.50)

Searching

Page 19: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Implement a query language that includes most features that are common in commercial and Web search engines.

Periodically harvest the MR (via OAI-PMH) to incorporate the latest changes in the library.

Allow search on resources’ metadata as well as textual content, when available.

Communication with portals is done via the Simple Digital Library Interoperability Protocol (SDLIP).

Search system general features

Page 20: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Search Architecture

MetadataRepository

Content

PortalPortal

Portal

SearchEngine

SDLIPWrapper

SDLIP

OAIHarvester

OAISearch and Discovery Server

http/ftpHarvester http/ftp

“Document”generator

Page 21: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Persistent Archive for the NSDL

Provide a persistent copy of the resources identified in the NSDL repository Provide a mechanism to retrieve prior

versions of resources Verify availability of on-line digital

resources that have presence in MR

Page 22: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Persistent Archive Approach Use data grid technology to:

Implement a persistent logical name space for registering resources

Manage archiving of modules on distributed storage systems

Use OAI harvesting to extract metadata from the NSDL repository

Crawl the web to retrieve resources Provide OAI interface for reporting validation

results Manage the persistent archive through a

separate information repository

Page 23: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Access Management

Authentication: user identity established by origin servers at home institution—NSDL central will run an origin if no other home available

Authorization: access classes of users, collections, & services established by NSDL community

anonymous and pseudo-anonymous access available

Internet2 “Shibboleth” framework satisfies these requirements

Page 24: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Access Management Flow

browser collection

institution’sauthenticationandauthorizationservice(e.g., Kerberos & LDAP)

1. attempt to access collection

2. redirected back to local login

3. login to local jurisdiction

4. attempt access again

5. confirm request valido

rga

niz

atio

na

l bo

un

da

ry

Page 25: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

The Problem

Cannot handcraft every web page

Must be usable on a very wide range of equipment and with a very diverse group of users

The Solution

Data driven portals using channels (components that encapsulate a library function).

Current NSDL portal technology is uPortal, a free, shareable portal being developed by a college and university consortium.

Initial NSDL channels will include simple and advanced Search, Browse, News, Exhibits, Help, and Login/Registration.

User Interfaces

Page 26: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Demonstration

http://nsdl.org

Page 27: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

We have only just begun…

Funding through 2006 Provide infrastructure that both:

Advances state-of-the-art of digital libraries Reliably delivers services and resources to

targeted users Making this possible through

Integration of work of partners (NSDL and external)

Co-development with partners Internal development

Page 28: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Facilities for Collaboration

All users can contribute resources to the library Collections (favorites), value added

enhancements (curricula), original contributions

Community formation, long and short term

Persistence of results of community formation

Page 29: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Management of Entities

Resources Services Relationships Users

Page 30: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Discovery of Entities

Capabilities for humans and agents Searching through structured

queries Browsing of indexes, vocabularies,

classifications

Page 31: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Relationship Management

Relationships are first-class objects Annotations, collections, equivalence,

inclusion Facilities

Identification Discovery Persistence Evolution Relationships of relationships

Page 32: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Knowledge layered on data

Ontologies, classification schemes, taxonomies, standards, and authority lists

Organize resources within concept spaces

Cross-walk and establish relationships among concept spaces

Page 33: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Control of entities

Access management for controlling the dissemination of intellectual property.

Mechanisms controlling disclosure of information with the goal of protecting privacy (i.e. COPPA)

Mechanisms for limiting inappropriate actions and entities

Page 34: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Customization and Personalization

Portals that provide specialized user interfaces and aggregation of collections and services in the library.

Mechanisms for users and communities to specialize their library experience.

Mechanisms to automatically adapt library behavior to user needs and abilities.

Page 35: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Accessibility

Platform Connectivity Physical Ability Language

Page 36: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Long-term technical capabilities:Measurement

Usage of the main NSDL portal and supported portals.

Performance of core services and network connections.

Popularity of various resources. Reliability of access to various

resources. Data and metadata quality. User demographics (where possible)

Page 37: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Realizing Goals and Capabilities:Building & supporting infrastructure Maintain and evolve the metadata

repository Maintain and evolve the main portal Define, disseminate and support a service

integration architecture Develop, integrate, support core services:

Search and discovery Persistence Metadata and data normalization &

enhancement Authentication Annotation Resource access

Page 38: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Realizing Goals and Capabilities:Defining and building exemplars

General theme: collaborative spaces for specialized communities, disciplines, resources

Motivations: Develop real products meeting needs

of real audiences Extrapolate from special cases to

general infrastructure Build essential partnerships

Page 39: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Realizing Goals and Capabilities:Defining and building exemplars

Primary life science education Eisenhower National Clearinghouse

Undergraduate math education Math Forum

Secondary geospatial education Alexandria digital library

Page 40: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

How do we do this:

Constructing targeted portals/libraries Primary life science education Undergraduate mathematics education Secondary geospatial education

To build generalized architecture Collaborative spaces Knowledge management Automatic data and metadata

management

Page 41: Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.

Some Closing Thoughts

Difficulty of building stability on shifting sands

What is low-barrier infrastructure? Barriers to ‘simple’ OAI and Dublin Core have

been relatively high Multiple problems with metadata from

distributed sources Correctness Trust Information content

Resource granularity and identity Automation is the key to success