Top Banner
Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference
27

Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

Jan 11, 2016

Download

Documents

Evan Townsend
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

Why are there so many catalogs and what can we

do about it?

Robin Wendler (and Dale Flecker)

November 2, 2000

Tufts Metadata Conference

Page 2: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

Catalogs galore

Traditional library materials

Social sciencedata sets

Art and CulturalImages

Archives

Botanical specimens

BiomedicalImages

Geo-spatial Data NetworkedResources

Page 3: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

REASONS FOR MULTIPLE CATALOGS

• Desire for autonomy

• Varying functional requirements

• Community-specific conventions, terminology

• Different metadata formats appropriate for different materials or in different contexts

Page 4: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

DESIRE FOR AUTONOMY

– libraries

– museums

– archives

– herbaria

– academic departments

– research labs

– hospitals

– ...

Catalogs operated by different administrative units such as

…units which may have more interest in interoperating with their fellows across institutional boundaries than with other kinds of organizations within the institution

Page 5: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

FUNCTIONAL NEEDS DIFFER

• Library catalogs: – support circulation; placing holds, recalls, or requests from

remote storage

– optimized for searching a large database and browsing large result sets

– draw a line between finding and using material

– use standards to support large scale exchange of metadata

– standard metadata lends itself to automated processing (e.g., authority control, identifying duplicates, merging records, creating well-ordered result lists

Page 6: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

FUNCTIONAL NEEDS DIFFER

• Image catalogs: integrate display of images with the catalog; “light table”, image comparison tools

• Geospatial catalogs: search via “bounding polygon” interface and determine relevance based on proportion of overlap, support “preview” rendering of data

• Statistical data catalogs: order datasets from ICPSR, exploratory statistical modeling

• Biomedical image catalogs: link between research projects, supporting images and resulting publications

Page 7: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

TERMINOLOGY AND CONVENTIONS DIFFER

• For people, organizations, places, topics...– Libraries use Library of Congress and Medical Subject

Headings

– VIA uses the Art and Architecture Thesaurus and Union List of Artists Names

– Herbaria use standardized botanical names and form personal names according to centuries-old practice

– Geodesy uses conventional notations for geographic coordinates

Page 8: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

METADATA DIFFERS...

• Because of historically different practices– Library standards require describing the object

in hand – Photo collection standards describe the object

pictured – Archives describe collective materials as they

are organized

And these differences are reflected in the formats used to record the descriptions

Page 9: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

METADATA DIFFERS...

• With the structure of what is being described– Image cataloging is often hierarchic, with many

pictures of a single described object, site, etc.– The cataloging for an archival collection is

structured to replicate the logical arrangement of the collection

– Dataset descriptions include variables and their locations

Page 10: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

METADATA DIFFERS...

• With community schemes and standards– Libraries use MARC and AACR2– The GIS community uses FGDC– The archival community uses EAD– The survey data community will be using DDI– The image community will use VRA Core– The text encoding community uses TEI

• Start with elements, move toward rules

Page 11: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

MORE REASONS FOR MULTIPLE CATALOGS

• Smaller catalogs are easier to use– 1.8% of all HOLLIS searches exceed maximum result

set limit (126,659 searches of 7 mill. in FY99)

– fewer functions to learn, but those used more often

• Specific catalogs can be tailored to targeted audiences– increasing precision of search results

– providing richer (or more frequently needed) functionality

Page 12: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

BUT...

• Multiple catalogs are confusing– How does a user know where to look?

• Multiple catalogs are inconvenient– Need to repeat a search multiple times

Page 13: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

SOME POSSIBLE SOLUTIONS

• Replicated descriptions

• Distributed search

• Super-catalog

• Links

Page 14: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

REPLICATED DESCRIPTIONS

• Same material described in more than one catalog– MARC AMC records and EAD finding aids– MARC and the library Portal– MARC for ICPSR datasets and Harvard/MIT Data

Center records

• Geodesy to experiment with single point of metadata creation/maintenance feeding two catalogs (HOLLIS and Geodesy)

Page 15: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

REPLICATED DESCRIPTIONS

• Issues– Can be labor intensive– Added maintenance burden– Mapping between metadata standards doesn’t

work well• ALWAYS involves some loss (of data, of meaning, of

specificity, and/or of accuracy)

• may be extremely difficult, e.g., Hierarchical VIA records or EAD finding aids would not map well into MARC

Page 16: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

DISTRIBUTED SEARCH

SEARCH FRONT

END

1. QUERY

SYSTEM 1

SYSTEM 2

SYSTEM 3

2. QUERY

2. QUERY

2. QUERY

3. RESPONSE

3. RESPONSE

3. RESPONSE

4. SUMMARYOR

CONSOLIDATEDRESPONSE

Page 17: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

DISTRIBUTED SEARCH

• Front-end query interface– Reformats user query as appropriate for each

target system• May allow user to choose which target(s) to query

– Sends queries in parallel– Handles search results

• May consolidate results into single set

• May simply summarize number of hits, and pass user to specific target system to display results

Page 18: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

DISTRIBUTED SEARCH -- ISSUES

• Front-end system is complex– Need to understand each target system

• Search syntax

• Results responses and formats

• Easier if all targets support Z39.50

– Constant maintenance is required as target systems are modified

• Performance sensitive to weakest link

Page 19: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

DISTRIBUTED SEARCH -- ISSUES

• Target systems frequently have non-parallel functions or use different terminology

• “find author” vs “find person”• “cancer” vs “neoplasms”

• Consolidating results into a single set is difficult– How to de-duplicate when same item is described in more than

one system– How to order heterogeneous result sets– How to display heterogeneous data formats

Page 20: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

SUPER-CATALOG

SUPER-CATALOG

2. QUERY

SYSTEM 1

SYSTEM 2

SYSTEM 3

3. RESPONSE

1. CONTRIBUTEMETADATA

1. CONTRIBUTEMETADATA

1. CONTRIBUTEMETADATA

Page 21: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

SUPER-CATALOG

• Union catalog of data from separate systems– Data collected through contribution or via

“harvesting”

• Data may require homogenizing– Format– Data elements– Terminology

Page 22: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

SUPER-CATALOG -- ISSUES

• Homogenizing can be complex– Terminology particularly difficult

• Homogenizing tends towards least-common-denominator– If one contributor only labels “person”, cannot offer

“author” search

• Likely to produce a catalog of “apples and oranges”– Single photographs/whole archival collections

Page 23: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

RELATED IDEA: “ACADEMIC LYCOS”

• Super catalog built from data in many academic research catalogs across institutions

• Built on Internet search engine technology– Based on familiar concepts and interfaces

• Being explored by DLF with Mellon foundation encouragement

Page 24: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

LINKS

• Supports navigation and assistance for sequential searching of multiple systems

• After searching one catalog, user given options of pursuing same query in other sources

• Primary exemplar is SFX system

Page 25: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

SFX SYSTEM

SYSTEM 11. QUERY

2. RESULTSWITH “LINKS?”

BUTTON

SFXSYSTEM

3. USER CLICKS“LINKS?” BUTTON

4. PAGE WITH MULTIPLE LINK

OPTION BUTTONS

SYSTEM 2

5. BUTTON GENERATES

PRE-FORMATTEDSEARCH

6. RESULTS

Page 26: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

LINKS -- ISSUES

• Each source system must be modified to provide appropriate “LINKS?” button

• Links server must understand data formats and search syntax for each linked system

• Does not address problems of non-parallel terminology and search functionality

• Potential user frustration, as many links will be dead ends

Page 27: Why are there so many catalogs and what can we do about it? Robin Wendler (and Dale Flecker) November 2, 2000 Tufts Metadata Conference.

THEREFORE….

• Many approaches, no ideal solution– Fundamental problem in digital libraries– Problem and solutions being widely analyzed

today