Top Banner
1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example
42

1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

1

William Y. Arms

September 26, 2002

A Research Program for

Information Sciencewith

the NSDL as an Example

Page 2: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2

A Scenario

A faculty member wished to find a paper for students to read in a class. He began by asking an expert. She suggested the original research paper as suitable.

Later, he typed a few terms into Google, browsed the hits, selected one that led to ResearchIndex, found the paper, and downloaded a PDF version from the author's web site.

Page 3: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3

Computer Science

Internet

Web

Google

ResearchIndex

PDF

Computer Science

Page 4: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

4

HCI

Browsing

Searching

User interface design

Human Computer Interaction

Computer Science

Page 5: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

5

HCI: Eye Tracking

Page 6: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

6

Roles of expert/instructor/student

Cognitive psychology

Linguistics

Natural language processing

CognitiveStudies HCI

Cognitive Studies

Computer Science

Page 7: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

7

Page 8: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

8

Organizational change

Economics

Ethics

Social culture

Law

SocietyCognitiveStudies HCI

Society

Computer Science

Page 9: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

9

SocietyCognitiveStudies HCI

Computer Science

Applications

Information Science

Page 10: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

10

Open Access to Scientific, Scholarly and

Professional Information

Page 11: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

11

Before the Web

Access to scientific, medical, legal information

In the United States:

excellent if you belonged to a rich organization (e.g, a major university)

very poor otherwise

In many countries of the world:

very poor for everybody

Page 12: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

12

Some Light Reading

William Y. Arms, "Economic models for open-access publishing." iMP, March 2000. http://www.cisp.org/imp/march_2000/03_00arms.htm

William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August 2000. http://www.dlib.org/dlib/july20/07contents.html

William Y. Arms, "What are the alternatives to peer review? Quality control in scholarly publishing on the web." Journal of Electronic Publishing, 8(1), August 2002. http://www.press.umich.edu/jep/08-01/arms.html

Page 13: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

13

Research Libraries are Expensive

library materials

buildings & facilities

staff

Page 14: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

14

Baumol's Cost Disease

Year

Price

1900 1950 2000

Bundle of goods and services

Labor-intensive services

Manufactured goods

2050

Page 15: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

15

Baumol's Cost Disease

Year

Price

1900 1950 2000

Bundle of goods and services

Labor-intensive services

Manufactured goods

2050

Moore's Law

Page 16: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

16

Brute Force Computing

Few people really understand Moore's Law

Computing power doubles every 18 monthsIncreases 100 times in 10 yearsIncreases 10,000 times in 20 years

Simple algorithms

plus

immense computing power

can outperform human intelligence

Page 17: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

17

Example: Catalogs and Indexes

Cost disease: catalogs and indexes

Catalog, index and abstracting records are very expensive when created by skilled professionals

Moore's Law: automatic indexing of full text

Retrieval effectiveness using automatic indexing can be at least as effective as manual indexing with controlled vocabularies

(Cleverdon 1967, reporting on experiments by Salton)

Page 18: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

18

Resistance to Change

"I used to be a heavy user of INSPEC. Now I use Google instead."

Page 19: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

19

Information Discovery:1992 and 2002

1992 2002

Content print digital

Computing expensive inexpensive

Choice of content selective comprehensive

Index creation human automatic

Frequency one time monthly

Vocabulary controlled not controlled

Query Boolean ranked retrieval

Users trained untrained

Page 20: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

20

Brute Force Computing:Substitutes for Human Intelligence

Automated algorithms for information discovery

Similarity of two documents

Vector space and statistical methods

(Salton, Sparc Jones, et al.)

Importance of digital object

Rank importance of web pages by analysis of the graph of web links

(Kleinberg, Page, et al.)

Page 21: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

21

Brute Force Computing: Automated Metadata Extraction

Informedia (Carnegie Mellon)

Automatic processing of segments of video, e.g., television news.

Algorithms for:

dividing raw video into discrete items

generating short summaries

indexing the sound track using speech recognition

recognizing faces

(Wactlar, et al.)

Page 22: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

22

Page 23: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

23

Simple algorithms

plus

immense computing power

plus

the intelligence of the user

can replace labor-intensive services

CognitiveStudies HCI

Low Cost Information

Computer Science

Page 24: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

24

The National Science Digital Library (NSDL)

Page 25: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2525

ScopeAll digital information relevant to any level of education in any branch of science.

Scientific and technical information

Materials used in education

Materials tailored toeducation

Page 26: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2626

All branches of science, all levels of education, very broadly defined:

Five year targets

1,000,000 different users

10,000,000 digital objects

10,000 to 100,000 independent sites

How Big might the NSDL be?

Page 27: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2727

Resources

Integration team

Budget $4-6 million

Staff 25 - 30

Management Diffuse How can a small team, without direct management control, create a very large-scale digital library?

Page 28: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2828

It is possible to build a very large digital library with a small staff.

But ...

Every aspect of the library must be planned with scalability in mind.

Some compromises will be made.

Philosophy

Page 29: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

2929

Basic AssumptionsThe integration team will not manage any collections

The integration team will not create any metadata

Page 30: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3030

... to provide a coherent set of collections and services across

great diversity

The Integration Task ...

Page 31: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3131

Interoperability

The Problem

Conventional approaches require partners to support agreements (technical, content, and business)

But NSDL needs thousands of very different partners

... most of whom are not directly part of the NSDL program

The challenge is to create incentives for independent digital libraries to adopt agreements

Page 32: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3232

Function Versus Cost of Acceptance

Function

Cost of acceptance

Many adopters

Few adopters

Page 33: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3333

Example: Textual Mark-up

Function

Cost of acceptance

SGML

ASCII

HTML

XML

Page 34: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3434

The Spectrum of Interoperability

Level Agreements Example

Federation Strict use of standards AACR, MARC(syntax, semantic, Z 39.50and business)

Harvesting Digital libraries expose Open Archivesmetadata; simple metadata harvesting

protocol and registry

Gathering Digital libraries do not Web crawlerscooperate; services must and search enginesseek out information

Page 35: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3535

What to Index?

Full text indexing is excellent, but full text indexing is not possible for all materials (non-textual, no access for indexing).

Comprehensive metadata is an alternative, but available for very few of the materials.

What Architecture to Use?

Few collections support an established search protocol (e.g., Z39.50).

Searching

Page 36: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3636

Broadcast Searching does not Scale

User interfaceserver

User

Collections

Page 37: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3737

Users

Collections

Metadata repository

The Metadata Repository

Services

The metadata repository is a resource for service providers.

It holds information about every collection and item known to the NSDL.

Page 38: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3838

Search Architecture

Portal

Portal

Portal

Search andDiscoveryServices Collections

SDLIP OAI

http

Metadata repository

James Allan, Bruce Croft (University of Massachusetts, Amherst)

Page 39: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

3939

Other TopicsUser interfaces: data driven portals using a channel architecture

Selection: selective web crawling, machine learning

Quality measures: ???

Page 40: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

4040

The Mortal behind the Portal

[This space left intentionally blank.]

Page 41: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

4141

The NSDL is a program of the National Science Foundation's Directorate for Education and Human Resources, Division of Undergraduate Education.

The NSDL Core Integration is a collaboration between the University Center for Atmospheric Research (Dave Fulker), Columbia University (Kate Wittenberg) and Cornell University (Bill Arms). The Technical Director is Carl Lagoze (Cornell University).

Acknowledgement

Page 42: 1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.

42

SocietyCognitiveStudies HCI

Computer Science

Applications

Information Science