Interfaces for Selecting and Understanding Collections.

Interfaces for Selecting and Understanding Collections

Selecting from Collections

Collections are sets of documents that have been coalesced by a human or system.

Traditional collections:– NLM’s MedLine– ACM Digital Library– LEXIS-NEXIS– Library/museum resources from a particular

donorHow do people with information needs

locate and identify the appropriate collections?

Does it Matter?

Web search engines (e.g. Google) get us the information we need …– well maybe

Web search drops users into the middle of a collection without any understanding of the collection and its overall characteristics.

Web search misses– Lots of more structured materials

• “the hidden web”

– Subscription-based content• Which is likely the best edited, most accurate, and most

valuable in specialized domains

Interfaces over Multiple Collections

Interfaces for Selecting and Understanding Collections– Lists– Overviews– Examples– Automated source selection

Lists of Collections

Usually just provides a list of collection names.– Difficult to select from if user does not know

the collections beforehand– Over time people bookmark collections of

value

Need tools for helping users who are outside of their areas of expertise

Example

Example

Examples

Overviews of Collections

Overviews provide a sense of what is in a collection

Overviews can be– Based on a category or directory structure– Automatically derived from the collection

Presentation of an overview is often a form of information visualization

Category-based Overviews

MedLine – biomedical collection– Medical Subject Headings (MeSH) consists of 18,000

categories in a directed acyclic graph

ACM Digital Library – computer science collection– Hierarchy of 1200 catgory (keyword) labels

Yahoo – the Web– Graph of directories (probably a DAG)

Humans have to place documents in categories– Author for ACM DL, subject experts for MedLine,

surfers for Yahoo

MeSH Browser

HiBrowse Browser

ConeTrees

Radial Views

Hyperbolic Views

MediaMetro

Automatically Derived Overviews

Apply clustering algorithms to document collection– Remember Automatic Global Analysis

• Use of co-occurrance and co-citation• Use of distance-based clustering approaches like hierarchic

agglomerative clustering

Need methods to determine labels for clusters– Could be a document

• identification of centroid (document most similar to all others)

• Identification of hubs (document most mentioned by cluster)

– Could be one or more terms• Use most common / best differentiator (using TF-IDF)

No human intervention required – but people are likely to be valuable as editors

Scatter Gather

Evaluation of Scatter-Gather

Scatter-Gather– Scatter-Gather conveyed overview of

collection contents– Scatter-Gather without search was less

effective than a basic search– Need to combine clustering with search

Themescapes

More Themescapes

Kohonen Maps

Evaluation of Graphical OverviewsOne study found that non-experts found the

clustering results difficult to use (worse than text-based views like Scatter-Gather)

Comparison of Kohonen map and Yahoo– 11 of 15 subjects found “interesting” page using

Kohonen• 8 were able to find same page using Yahoo

– 14 of 16 subjects found “interesting” page using Yahoo

• 2 were able to find same page using Kohonen– Subjects liked ability to jump between categories

without backing out of current categoryUnsupervised thematic overviews probably better

for giving a gist of what is in a collection than for search.

Examples, Dialogs, Wizards

Retrieval by reformulation– Start with example queries

• Rabbit, Helgon

– Can be difficult to find appropriate starting query

Wizards– Found to be helpful for users without necessary domain

knowledge get through many step processes – Not helpful when wizard not accompanied with help– Not useful when goal is teaching how to use the interface.

Guided tours– Presents a logical sequence of navigation choices for

accomplishing a goal (e.g. Waldens Paths)– Not evaluated with regards to information access

Automated Source Selection

Selecting collection automatically (but explicitly)Need a model of each collection

– What it covers, need model of topics– What it is good at, need metric for good

Develop a model of the user’s information needMatch the information need to the most valuable

collections for that topicUsed in meta-search – interesting area of

researchCould be starting point for interactive collection

selection.

Interfaces for Selecting and Understanding Collections.

Documents

collections slide

search slide

yahoo slide

themescapes slide

conetrees slide

mediametro slide

collections collections

mesh browser slide