Top Banner
Information Discovery Library Catalogs 2
37

Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Jan 17, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Information Discovery

Library Catalogs 2

Page 2: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

What to Catalog: IFLA Model

Work A work is the underlying abstraction, e.g.,

• The Iliad• The Computer Science departmental web site• Beethoven’s Fifth Symphony• Unix operating system• The 1996 U.S. census

This is roughly equivalent to the concept of “literary work” used in copyright law.

Page 3: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

IFLA Model

Expression. A work is realized through an expression, e.g.,

• The Illiad has oral expressions and written expressions• A musical work has score and performance(s).• Software has source code and machine code

Many works have only a single expression, e.g. a web page, or a book.

Page 4: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

IFLA Model

Manifestation. A expression is given form in one or more manifestations, e.g.,

• The text of The Iliad has been manifest in numerous manuscripts and printed books.

• A musical performance can be distributed on CD, or broadcast on television.

• Software is manifest as files, which may be stored or transmitted in any digital medium.

Page 5: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

IFLA Model

Item. When many copies are made of a manifestation, each is a separate item, e.g.,

• a specific copy of a book

• computer file

Page 6: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Cataloguing Online Materials: DC

Dublin Core is an attempt to apply cataloguing methods to online materials, notably the Web.

History

It was anticipated that the methods of full text indexing that were used by the early Web search engines, such as Lycos, would not scale up.

“... [automated] indexes are most useful in small collections within a given domain. As the scope of their coverage expands, indexes succumb to problems of large retrieval sets and problems of cross disciplinary semantic drift. Richer records, created by content experts, are necessary to improve search and retrieval.” Weibel 1995

Page 7: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core

Simple set of metadata elements for online information

• 15 basic elements

• intended for all types and genres of material

• all elements optional

• all elements repeatable

Developed by an international group chaired by Stuart Weibel since 1995.

Page 8: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: TitleDefinition: A name given to the resource.

Comment: Typically, Title will be a name by which the resource is formally known.

Element Name: CreatorDefinition: An entity primarily responsible for making the content of the resource.

Comment: Examples of Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.

Page 9: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: SubjectDefinition: A topic of the content of the resource.

Comment: Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

Element Name: DescriptionDefinition: An account of the content of the resource.

Comment: Examples of Description include, but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

Page 10: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: PublisherDefinition: An entity responsible for making the resource available

Comment: Examples of Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.

Element Name: ContributorDefinition: An entity responsible for making contributions to the content of the resource.

Comment: Examples of Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.

Page 11: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: DateDefinition: A date of an event in the lifecycle of the resource.

Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and includes (among others) dates of the form YYYY-MM-DD.

Page 12: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: TypeDefinition: The nature or genre of the content of the resource.

Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [DCT1]). To describe the physical or digital manifestation of the resource, use the FORMAT element.

Page 13: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: FormatDefinition: The physical or digital manifestation of the resource.

Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).

Page 14: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: IdentifierDefinition: An unambiguous reference to the resource within a given context.

Comment: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

Page 15: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: SourceDefinition: A Reference to a resource from which the present resource is derived.

Comment: The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

Page 16: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: LanguageDefinition: A language of the intellectual content of the resource.

Comment: Recommended best practice is to use RFC 3066 [RFC3066] which, in conjunction with ISO639 [ISO639]), defines two- and three-letter primary language tags with optional subtags. Examples include "en" or "eng" for English, "akk" for Akkadian", and "en-GB" for English used in the United Kingdom.

Element Name: RelationDefinition: A reference to a related resource.Comment: Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

Page 17: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: CoverageDefinition: The extent or scope of the content of the resource.

Comment: Typically, Coverage will include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and to use, where appropriate, named places or time periods in preference to numeric identifiers such as sets of coordinates or date ranges.

Page 18: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core elements

Element Name: RightsDefinition: Information about rights held in and over the resource.

Comment: Typically, Rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions may be made about any rights held in or over the resource.

Page 19: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 20: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core

contributor: Dublin Core Metadata Initiative

description: The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models...

title: Dublin Core Metadata Initiative (DCMI) Home Page

date: 2004-10-05

format: text/html (MIME type)

language: en (English)

Page 21: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Representations of DC: Meta Tags

<meta name="DC.title" content="Dublin Core Metadata Initiative (DCMI) Home Page" />

<meta name="DC.description" content="The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models..." />

<meta name="DC.date" content="2004-10-05" />

<meta name="DC.format" content="text/html" />

<meta name="DC.contributor" content="Dublin Core Metadata Initiative" />

<meta name="DC.language" content="en" />

Page 22: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Qualifiers

Example: element qualifier

Example: Date

DC.Date.Created 1997-11-01

DC.Date.Issued 1997-11-15

DC.Date.Available 1997-12-01/1998-06-01

DC.Date.Valid 1998-01-01/1998-06-01

A qualifier refines the element name to add specificity

Page 23: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Qualifiers

Example: value qualifiers

Example: Subject

DC.Subject.DDC 509.123 (Dewey Decimal

Classification)

DC.Subject.LCSH Digital libraries-United States

(Library of Congress Subject Heading)

Page 24: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dumbing Down Principle

"The theory behind this principle is that consumers of metadata should be able to strip off qualifiers and return to the base form of a property. ... this principle makes it possible for client applications to ignore qualifiers in the context of more coarse-grained, cross-domain searches."

Lagoze 2001

Page 25: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dumbing Down Principle

Qualified version

DC.Date.Created 1997-11-01

DC.Subject.LCSH Digital libraries-United States

Dumbed-down version

DC.Date 1997-11-01 a valid date

DC.Subject Digital libraries-United States a valid subject description

Page 26: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core with qualifiers

See the next two slides for an example of a Dublin Core record for a web site prepared by a professional cataloguer at the Library of Congress.

Note that the record does not follow the principle of dumbing-down.

Page 27: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.
Page 28: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.
Page 29: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Flat v. linked records

Flat record

All information about an item is held in a single Dublin Core record, including information about related items

convenient for access and preservationinformation is repeated -- maintenance

problem

Linked record

Related information is held in separate records with a link from the item record

less convenient for access and preservation

information is stored once

Compare with normal forms in relational databases

Page 30: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.
Page 31: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Representations of DC: XML (with qualifiers)

<title>Digital Libraries and the Problem of Purpose</title>

<creator>David M. Levy</creator>

<publisher>Corporation for National Research Initiatives</publisher>

<date date-type = "publication">January 2000</date>

<type resource-type = "work">article</type>

<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>

<identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>

<language>English</language>

<rights>Copyright (c) David M. Levy</rights>to be continued

Page 32: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core with flat record extension

Continuation of D-Lib Magazine record

<relation rel-type = "InSerial">

<serial-name>D-Lib Magazine</serial-name>

<issn>1082-9873</issn>

<volume>6</volume>

<issue>1</issue>

</relation>

Page 33: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Limits of DC and MARC: Complex Objects

Complex objects

• Article within a journal• Page within a Web site• A thumbnail of another image• The March 28 final edition of a newspaper

Complete object

Sub-objects

Metadata records

Page 34: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Limits of Dublin Core and MARC: Events

Version 1

New material

Version 2

Should Version 2 have its own record or should extra information be added to the Version 2 record?

How are these represented in Dublin Core or MARC?

Page 35: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Using Catalog Data for IR

The basic operation of information retrieval is to match the way that a user describes an information requirement (a query), against the way that items are described (an index).

The success of conventional catalogs (e.g., MARC + Anglo-American Cataloguing Rules) or indexing services (e.g., Medline) comes from the use of precise language to describe items combined with trained and experienced users to formulate queries.

Page 36: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Why isn’t DC used to index & search the web?

Technology: The methods used in early Infoseek, Lycos and Altavista have been greatly enhanced.

(Note that these methods provide quite good precision at the expense of low recall.)

Users: The typical user who searches the Web has limited training and does not understand catalogs.

Economics: The size of the Web makes human indexing of every important site impossible. The rate of change requires frequent re-indexing.

Page 37: Information Discovery Library Catalogs 2. What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science.

Dublin Core in Many Languages

See:

Thomas Baker, Languages for Dublin Core, D-Lib MagazineDecember 1998, http://www.dlib.org/dlib/december98/12baker.html