Top Banner
Controlling values The equivalence relationship
23

Controlling values The equivalence relationship. The vocabulary problem What is this?

Dec 14, 2015

Download

Documents

Taryn Justin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Controlling values The equivalence relationship. The vocabulary problem What is this?

Controlling values

The equivalence relationship

Page 2: Controlling values The equivalence relationship. The vocabulary problem What is this?

The vocabulary problem

What is this?

Page 3: Controlling values The equivalence relationship. The vocabulary problem What is this?

Synonymy

Restroom, bathroom, toilet, loo, facilities, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room. . .

Synonymy: Using different words to identify the same concept.

Page 4: Controlling values The equivalence relationship. The vocabulary problem What is this?

Another vocabulary problem

What is mercury?

What is bank?

What is python?

What is java?

Page 5: Controlling values The equivalence relationship. The vocabulary problem What is this?

Polysemy

Polysemy: Using the same word (morphologically speaking) to identify different concepts.

Java: Island in Indonesia, variety of coffee bean, generic term for coffee, object-oriented programming language.

Page 6: Controlling values The equivalence relationship. The vocabulary problem What is this?

Yet more vocabulary problems

The White House has been lobbying Congress to support the proposed budget. . .

Freedom of the press is an important value in the United States. . .

I’m tired of taking the bus; I need some new wheels. . .

Page 7: Controlling values The equivalence relationship. The vocabulary problem What is this?

Metonymy and synecdoche

Metonymy: Using a related concept to stand for another concept.

Synecdoche: Using the word for part of something to stand for the entire thing.

Page 8: Controlling values The equivalence relationship. The vocabulary problem What is this?

Do people label consistently?

No.

Furnas and colleagues asked people (including subject experts) to label a variety of items (recipes, text editing operations, “common content objects”). Surprise, there was little agreement among the names submitted by participants.

Conclusion: “The idea of an ‘obvious,’ ‘self-evident,’ or ‘natural’ term is a myth! Since even the best possible name is not very useful, it follows that there can exist no rules, guidelines or procedures for choosing a good name, in the sense of ‘accessible to the unfamiliar user.’”

Page 9: Controlling values The equivalence relationship. The vocabulary problem What is this?

What to do?

Furnas and colleagues suggest that interface designers:

• Implement unlimited aliasing.

• Disambiguate terms that can be used in multiple senses by presenting possibilities to users and asking them to select the appropriate one.

Page 10: Controlling values The equivalence relationship. The vocabulary problem What is this?

Limitations of Furnas study

• Participants were asked to label objects, not how they would search for objects.

• The study assumes a search interface, not a browsing (or menu-driven) interface.

In a search interface, users must recall or guess an object’s name. In a browsing interface, users merely need to recognize the appropriate term.

Page 11: Controlling values The equivalence relationship. The vocabulary problem What is this?

Vocabulary problems and information systems

Designers of information organization systems have long grappled with the ambiguities of language.

Synonymy, polysemy, and so on complicate the goal to collocate, or bring together, like items in an information system.

Page 12: Controlling values The equivalence relationship. The vocabulary problem What is this?

Vocabulary control

In LIS, vocabulary control is similar to Furnas’s idea of aliasing: concepts are associated with their synonyms.

One term is designated as preferred: this is the term used in a display. Other labels associated with the concept are used in searching.

Example: Search Nordstrom.com for “frock” and get “dresses” instead.

Page 13: Controlling values The equivalence relationship. The vocabulary problem What is this?

Example of a controlled term

Preferred term: bathroom

Equivalent terms: restroom, loo, toilet, WC, ladies’ room, mens’ room, little girls’ room, little boys’ room, ladies room, ladys room, lady’s room, ladie’s room, ladys’ room...

Page 14: Controlling values The equivalence relationship. The vocabulary problem What is this?

Equivalence can be relative

Similar concepts may be treated as equivalents; this is a design decision by the vocabulary creator.

ExampleVocabulary includes this preferred term: BeerThese terms are designated as equivalents: ale, porter, stout, pilsner, bock, IPA, malt liquor, barley wine.

Page 15: Controlling values The equivalence relationship. The vocabulary problem What is this?

Disambiguation in vocabularies

Polysemous terms are often identified by adding qualifying terms in parentheses.

Mercury (chemical element)

Mercury (god in Greek mythology)

Search engines may use ask users to select the sense they want.

Page 16: Controlling values The equivalence relationship. The vocabulary problem What is this?

Digression into the library catalog

Library catalogs have three traditional access points: author, title, and subject. In the old card catalog, these were the three ways that users could search.

Each of these access points has associated vocabulary control.

Page 17: Controlling values The equivalence relationship. The vocabulary problem What is this?

Control of names

In library cataloging, controlled vocabularies for authors, titles, and subjects are called authority files.

Authority files both disambiguate names that identify multiple people or items and group variations for the same person or item (that is, they deal with polysemy and synonymy).

Page 18: Controlling values The equivalence relationship. The vocabulary problem What is this?

Authority file examples

In the UT author authority file: headings for Patricia Williams:

• Names are disambiguated by using middle initials and dates of birth.

• Cross references are used for some authors.

• There may still be two headings for one person.

Page 19: Controlling values The equivalence relationship. The vocabulary problem What is this?

Fun digression: Pseudonyms in the catalog

The current catalog maintains pseudonymous identities (in older catalogs, everything went under the author’s real name).

For example, “Carolyn Keene,” the name used by multiple people as the author for the Nancy Drew novels, is maintained as an author entity in the authority file.

Page 20: Controlling values The equivalence relationship. The vocabulary problem What is this?

Thesauri

Thesauri are a type of controlled vocabulary that include equivalence, hierarchical, and associative relationships. Thesauri can also be faceted (that is, represent multiple aspects of a concept...we will discuss facets in depth later).

Thesauri are often developed to deal with subjects of documents, and we will talk a lot about this beginning in a few weeks.

Page 21: Controlling values The equivalence relationship. The vocabulary problem What is this?

Example thesaurus entry

Dark chocolateBT ChocolateRT Single-origin chocolateUF Semisweet chocolate

Baker’s chocolateSweet chocolate

SN Chocolate without milk solids and with less than

70 percent chocolate mass.

BT: broader term, one level up in a hierarchy

RT: related term, in another facet or hierarchical branch

UF: Use for; synonyms, or non-preferred terms

SN: Scope note; definitions or usage guidelines

Page 22: Controlling values The equivalence relationship. The vocabulary problem What is this?

Controlled vocabulary example: MeSH and PubMed

The Medical Subject Headings (MeSH) index journal articles for the PubMed database.

Keyword searches in PubMed are automatically expanded with MeSH. Searches can also be explicitly limited to MeSH terms, which can increase precision.

The comparison to a system like Google Scholar is illuminating.

Page 23: Controlling values The equivalence relationship. The vocabulary problem What is this?

Summary

• Controlled vocabularies increase precision and recall in searching by identifying equivalent terms.

• Authority files are types of controlled vocabularies.

• Thesauri are subject-based controlled vocabularies that include hierarchical and associative relationships in addition to equivalence relationships. Thesauri can also be used as browsing interfaces.