Top Banner
Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007
78

Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Interaction

LBSC 796/INFM 718R

Douglas W. Oard

Week 4, October 1, 2007

Page 2: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Moore’s Law

transistorsspeed

storage...

1950 1990 2030

computerperformance

Page 3: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Human Cognition

1990

1950 1990 2030

humanperformance

Page 4: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Where is the bottleneck?

Slide idea by Bill Buxton

system vs. human performance

Page 5: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Interaction Points

SourceSelection

Search

Query

Selection

Ranked List

Examination

Documents

Delivery

Documents

QueryFormulation

Resource

source reselection

System discoveryVocabulary discoveryConcept discoveryDocument discovery

Help users decide where to start

Help users formulate queries

Help users make sense of results and navigate information space

Page 6: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Information Needs

RIN0

PIN0 PINm

r0 r1

q0

q1 q2 q3

rn

qr

Stefano Mizzaro. (1999) How Many Relevances in Information Retrieval? Interacting With Computers, 10(3), 305-322.

Real information needs (RIN) = visceral need

Perceived information needs (PIN) = conscious need

Request = formalized need

Query = compromised need

Page 7: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Anomalous State of Knowledge

• Belkin: Searchers do not clearly understand– The problem itself– What information is needed to solve the problem

• The query results from a clarification process

• Dervin’s “sense making”:

Need

Gap Bridge

Page 8: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Q0

Q1

Q2

Q3

Q4

Q5

A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.”

Bates’ “Berry Picking” Model

Page 9: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Broder’s Web Query Taxonomy

• Navigational (~20%)– Reach a particular site (“known item”)

• Informational (~50%)– Acquire static information (“topical”)

• Transactional (~30%)– Perform a Web-mediated activity (“service”)

Andrei Broder, SIGIR Forum, Fall 2002

Page 10: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Some Desirable Features

• Make exploration easy

• Relate documents with why they are retrieved

• Highlight relationships between documents

Page 11: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Agenda

Query formulation

• Selection

• Examination

• Source selection

• Project 3

Page 12: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Query Formulation

• Command Language

• Form Fill-in

• Menu Selection

• Direct Manipulation

• Natural Language

Ben Shneiderman, 1997

Page 13: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

WESTLAW® Query Examples• What is the statute of limitations in cases involving the federal tort claims act?

– LIMIT! /3 STATUTE ACTION /S FEDERAL /2 TORT /3 CLAIM

• What factors are important in determining what constitutes a vessel for purposes of determining liability of a vessel owner for injuries to a seaman under the “Jones Act” (46 USC 688)?

– (741 +3 824) FACTOR ELEMENT STATUS FACT /P VESSEL SHIP BOAT /P (46 +3 688) “JONES ACT” /P INJUR! /S SEAMAN CREWMAN WORKER

• Are there any cases which discuss negligent maintenance or failure to maintain aids to navigation such as lights, buoys, or channel markers?

– NOT NEGLECT! FAIL! NEGLIG! /5 MAINT! REPAIR! /P NAVIGAT! /5 AID EQUIP! LIGHT BUOY “CHANNEL MARKER”

• What cases have discussed the concept of excusable delay in the application of statutes of limitations or the doctrine of laches involving actions in admiralty or under the “Jones Act” or the “Death on the High Seas Act”?

– EXCUS! /3 DELAY /P (LIMIT! /3 STATUTE ACTION) LACHES /P “JONES ACT” “DEATH ON THE HIGH SEAS ACT” (46 +3 761)

Page 14: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Form-Based Query Specification

Credit: Marti Hearst

Page 15: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Dir

ect M

anip

ulat

ion

Spe

c.V

QU

ER

Y (

Jone

s 98

)

Credit: Marti Hearst

Page 16: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

The “Back” Button

• Behavior is counterintuitive to many users

A

B

D

C

You hit “back” twice from page D.Where do you end up?

Page 17: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

PadPrints

• Tree-based history of recently visited Web pages– History map placed to left of browser window– Node = title + thumbnail– Visually shows navigation history

• Zoomable: ability to grow and shrink sub-trees

Page 18: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Visual Browsing History in PadPrints

Page 19: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

PadPrints Thumbnails

Page 20: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Alternate Query Modalities

• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies

• But some error correction method must be included

• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used

• Ordinary handwriting often has too much ambiguity

Page 21: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Agenda

• Query formulation

Selection

• Examination

• Source selection

• Project 3

Page 22: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

A Selection Interface Taxonomy

• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, score threshold

• Two dimensional displays– Construction: clustering, starfield, projection– Navigation: jump, pan, zoom

• Three dimensional displays– Contour maps, fishtank VR, immersive VR

Page 23: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Google: KeyWord In Context (KWIC)

Query: University of Maryland College Park

Page 24: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Summarization

Page 25: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Indicative vs. Informative

• Terms often applied to document abstracts– Indicative abstracts support selection

• They describe the contents of a document

– Informative abstracts support understanding• They summarize the contents of a document

• Applies to any information presentation– Presented for indicative or informative purposes

Page 26: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Selection/Examination Tasks

• “Indicative” tasks– Recognizing what you are looking for

– Determining that no answer exists in a source

– Probing to refine mental models of system operation

• “Informative” tasks– Vocabulary acquisition

– Concept learning

– Information use

Page 27: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Generated Summaries

• Fluent summaries for a specific domain• Define a knowledge structure for the domain

– Frames are commonly used

• Analysis: process documents to fill the structure– Studied separately as “information extraction”

• Compression: select which facts to retain• Generation: create fluent summaries

– Templates for initial candidates– Use language model to select an alternative

Page 28: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Extraction-Based Summarization

• Robust technique for making disfluent summaries

• Four broad types:– Query-biased vs. generic– Term-oriented vs. sentence-oriented

• Combine evidence for selection:– Salience: similarity to the query– Specificity: IDF or chi-squared– Emphasis: title, first sentence

Page 29: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Goldilocks and the Three Summaries…

• The entire document: too much!

• The exact answer: too little!

• The surrounding paragraph: just right…Overall Interface Condition Preferences

Sentence20.00%

Document23.33%

Exact Answer3.33%

Paragraph53.33%

It occurred on July 4, 1776.What does this pronoun refer to?

Jimmy Lin, Dennis Quan, Vineet Sinha, Karun Bakshi, David Huynh, Boris Katz, and David R. Karger. (2003) What Makes a Good Answer? The Role of Context

in Question Answering. Proceedings of INTERACT 2003.

Page 30: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Ask: Suggested Query Refinements

Page 31: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Open Directory Project

http://www.dmoz.org

Page 32: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

SWISH

List Interface Category Interface

Query: jaguar

Hao Chen and Susan Dumais. (2000) Bringing Order to the Web: Automatically Categorizing Search Results. Proceedings of CHI 2000.

Page 33: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Text Classification• Problem: automatically sort items into bins

• Machine learning approach– Obtain a training set with ground truth labels– Use a machine learning algorithm to “train” a classifier

• kNN, Bayesian classifier, SVMs, decision trees, etc.

– Apply classifier to new documents• System assigns labels according to patterns learned in the

training set

Page 34: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Machine Learning

label1 label2 label3 label4

Text ClassifierSupervised Machine Learning Algorithm

Unlabeled Document

label1?

label2?

label3?

label4?

TestingTraining

Training examples

Representation Function

Page 35: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

k Nearest Neighbor (kNN) Classifier

Page 36: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

kNN Algorithm

• Select k most similar labeled documents

• Have them “vote” on the best label:– Each document gets one vote, or – More similar documents get a larger vote

• How can similarity be defined?

Page 37: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Cat-a-Cone

Page 38: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Cat-a-Cone

• Key Ideas:– Separate documents from category labels– Show both simultaneously– Link the two for iterative feedback– Integrate searching and browsing

• Distinguish between:– Searching for documents– Searching for categories

Page 39: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Collection

Retrieved

Documents

Category

Hierarchy

browsebrowse query termssearchsearch

Cat-a-Cone Architecture

Page 40: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

The Cluster Hypothesis

“Closely associated documents tend to be relevant to the same requests.”

van Rijsbergen 1979

Page 41: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Vivisimo: Clustered Results

http://www.vivisimo.com

Page 42: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Kartoo’s Cluster Visualization

http://www.kartoo.com/

Page 43: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Clustering Result Sets• Advantages:

– Topically coherent document sets are presented together– User gets a sense for the themes in the result set– Supports browsing retrieved hits

• Disadvantages:– May be difficult to understand the theme of a cluster based on

summary terms– Clusters themselves might not “make sense”– Computational cost

Page 44: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Visualizing Clusters

Centroids

Page 45: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Hierarchical Agglomerative Clustering

Page 46: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Another Way to Look at H.A.C.

A B C D E F GH

Page 47: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

The H.A.C. Algorithm

• Start with each document in its own cluster

• Until there is only one cluster:– Determine the two most similar clusters ci and cj

– Replace ci and cj with a single cluster ci cj

• The history of merging forms the hierarchy

Page 48: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Cluster Similarity• Assume a similarity function that determines the

similarity of two instances: sim(x,y)– What’s appropriate for documents?

• What’s the similarity between two clusters?– Single Link: similarity of two most similar members– Complete Link: similarity of two least similar members– Group Average: average similarity between members

Page 49: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

K-Means Clustering

Pick seeds

Reassign clusters

Compute centroids

xx

Reasssign clusters

xx xx Compute centroids

Reassign clusters

Converged!

Page 50: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

K-Means

• Each cluster is characterized by its centroid (center of gravity):

• Reassignment of documents to clusters is based on distance to the current cluster centroids

cx

xc

||

1(c)μ

Page 51: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

K-Means Algorithm• Let d be the distance measure between documents

• Select k random instances {s1, s2,… sk} as seeds

• Until clustering converges:– Assign each instance xi to the cluster cj such that

d(xi, sj) is minimal

– Update the seeds to the centroid of each cluster

– For each cluster cj, sj = (cj)

Page 52: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

K-Means: Discussion

• How do you select k?

• Results can vary based on random seed selection– Some seeds can result in poor convergence rate,

or convergence to sub-optimal clusters

Page 53: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

symbols 8 docsfilm, tv 68 docsastrophysics 97 docsastronomy 67 docsflora/fauna 10 docs

Clustering and re-clustering is entirely automated

sports 14 docsfilm, tv 47 docsmusic 7 docs

stellar phenomena 12 docsgalaxies, stars 49 docsconstellations 29 docsmiscellaneous 7 docs

Query = “star” on encyclopedic text

Scatter/Gather

Page 54: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Scatter/Gather• Sustem clusters documents into “themes”

– Displays clusters by showing:• Topical terms

• Typical titles

• User chooses a subset of the clusters

• System re-clusters documents in selected cluster– New clusters have different, more refined, “themes”

Marti A. Hearst and Jan O. Pedersen. (1996) Reexaming the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proceedings of SIGIR 1996.

Page 55: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.
Page 56: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.
Page 57: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Summary: Clustering• Advantages:

– Provides an overview of main themes in search results– Helps overcome polysemy

• Disadvantages:– Documents can be clustered in many ways– Not always easy to understand the theme of a cluster– What is the correct level of granularity?– More information to present

Page 58: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Recap

• Clustering– Automatically group documents into clusters

• Classification– Automatically assign labels to documents

Page 59: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Agenda

• Query formulation

• Selection

Examination

• Source selection

• Project 3

Page 60: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Examining Individual Documents

Page 61: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Document lens

Robertson & Mackinlay, UIST'93, Atlanta, 1993

Page 62: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Distorting Reality

Bifocal Perspective Wall

Fisheye

Page 63: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

1-D Fisheye Menu

http://www.cs.umd.edu/hcil/fisheyemenu/fisheyemenu-demo.shtml

Page 64: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

1-D Fisheye Document Viewer

Page 65: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

SeeSoft

[Eick 94]

Page 66: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Mainly about both DBMS and reliability

Mainly about DBMS, discusses reliability

Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability

Mainly about high-tech layoffs

TileBarsTopic: reliability of DBMS (database systems)Query terms: DBMS, reliability

DBMS

reliability

DBMS

reliability

DBMS

reliability

DBMS

reliability

Page 67: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

U Mass: Scrollbar-Tilebar

Page 68: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Agenda

• Query formulation

• Selection

• Examination

Source selection

• Project 3

Page 69: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

ThemeView

Pacific Northwest National Laboratory http://www.pnl.gov/infoviz/technologies.html

Page 70: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

WebTheme

Page 71: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Ben S’ ‘Seamless Interface’ Principles• Informative feedback

• Easy reversal

• User in control– Anticipatable outcomes– Explainable results– Browsable content

• Limited working memory load– Query context– Path suspension

• Alternatives for novices and experts• Scaffolding

Page 72: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

My ‘Synergistic Interaction’ Principles– Interdependence with process (“interaction models”)

• Co-design with search strategy• Speed

– System initiative• Guided process• Exposing the structure of knowledge

– Support for reasoning• Representation of uncertainty• Meaningful dimensions

– Synergy with features used for search• Weakness of similarity, Strength of language

– Easily learned• Familiar metaphors (timelines, ranked lists, maps)

Page 73: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Some Good Ideas

• Show the query in the selection interface– It provides context for the display

• Suggest options to the user– Query refinements, for example

• Explain what the system has done– Highlight query terms in the results, for example

• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility

Page 74: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Agenda

• Query formulation

• Selection

• Examination

• Source selection

Project 3

Page 75: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Expertise@Maryland• Goal

– Create a system to help research administration identify faculty members with specific research interests

• Design Criteria– Maximize reliance on available information– Help the user, but don’t try to replace them– Offer immediate utility to untrained users

Page 76: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

Faculty

Acti

vity R

eport

Papers

Grant proposals

University database

Multiple repositories

List of papersDescriptive

terms

Expertisesearch engine

Page 77: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

FacultyActivity

DB

Get faculty Publications

BibliographicReferenceExtractor

Extract publicationauthor, title, journal,date from faculty activity DB entries

PDFs fromWeb

resources

Obtain digital copies of publications from Library e-resources

FormatConversion

Extract content words from PDF

Build IndexSearchEngine

Use terms to find faculty members strongly connected to those terms

Interface

Enter search terms, examine “hit list”, refine search terms, …

Automatically associate descriptive terms with faculty members

Page 78: Interaction LBSC 796/INFM 718R Douglas W. Oard Week 4, October 1, 2007.

One Minute Paper

• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.