Top Banner
Amanda Spink : Analysis of Web Searching and Retrieval Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004
25

Amanda Spink : Analysis of Web Searching and Retrieval

Feb 08, 2016

Download

Documents

casper

Amanda Spink : Analysis of Web Searching and Retrieval. Larry Reeve INFO861 - Topics in Information Science Dr. McCain - Winter 2004. Background. Amanda Spink Self-described areas of work: Information Retrieval Web Retrieval Human Information Behavior / Information Seeking - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Amanda Spink :  Analysis of Web Searching and Retrieval

Amanda Spink : Analysis of Web

Searching and Retrieval

Larry ReeveINFO861 - Topics in Information Science

Dr. McCain - Winter 2004

Page 2: Amanda Spink :  Analysis of Web Searching and Retrieval

2

Background

Amanda Spink Self-described areas of work:

Information Retrieval Web Retrieval Human Information Behavior / Information

Seeking Medical Informatics

Ph.D. 1993 – Rutgers University Thesis - Feedback in Information Retrieval Studied under Tefko Saracevic

Page 3: Amanda Spink :  Analysis of Web Searching and Retrieval

3

Background

Amanda Spink Over 140 papers published

5th in journal article production, 18th in citation production among U.S. IS faculty

Institute for Information Science – most highly cited paper in Web Retrieval: Real Life, Real Users, Real needs: A Study and Analysis of

User Queries on the Web (2000)

Page 4: Amanda Spink :  Analysis of Web Searching and Retrieval

4

Background

Amanda Spink Associate Professor at University of Pittsburgh

School of Information Sciences

Prior faculty positions Pennsylvania State University

School of Information Science & Technology Web Research Group

University of North Texas School of Library and Information Sciences

Page 5: Amanda Spink :  Analysis of Web Searching and Retrieval

5

Background

Tefko Saracevic

Associate Dean School of Communication, Information and Library

Studies, Rutgers University

Related research Test and Evaluation of IR systems Relevance in Information Science Analysis of web queries

Page 6: Amanda Spink :  Analysis of Web Searching and Retrieval

6

Web Searching and Retrieval

Analyze user queries Important for building future IR systems on Web

Focus on search terms Failure analysis in query construction Term Relevance Feedback (TRF) Topics / Classification Use of language

Page 7: Amanda Spink :  Analysis of Web Searching and Retrieval

7

Studies Conducted

U.S. – Excite (www.excite.com) “51K study”

51,473 queries 18,113 users March 9, 1997

“1M study” 1,025,910 queries 211,063 users September 16, 1997

Page 8: Amanda Spink :  Analysis of Web Searching and Retrieval

8

Studies Conducted

European - AllTheWeb.com 1 million queries 200,000 users

Logs from two days: February 6, 2001 May 28, 2002

Most users from Norway and Germany

Page 9: Amanda Spink :  Analysis of Web Searching and Retrieval

9

Studies Conducted

Issues with Web transaction logs Where does session start and end?

Temporal boundary – Spink found 15 mins avg, Others found 5mins, 12mins, 32mins, and 2 hours

Numerical boundary – 100 entries

How to eliminate non-individual users Meta-search engines, other agents

No user insight into user’s process

Page 10: Amanda Spink :  Analysis of Web Searching and Retrieval

10

Findings

Relevance Feedback Advanced Search Techniques Term Characteristics Query Classification American vs. European

Page 11: Amanda Spink :  Analysis of Web Searching and Retrieval

11

Findings: Relevance Feedback Term Relevance Feedback (TRF) rarely used

51K study 1,597 queries from 823 users (<5% of queries)

Those using TRF had longer sessions Successful 60% of time Implications:

Failure rate of 40% may be too high IR designers could automatically perform TRF

Page 12: Amanda Spink :  Analysis of Web Searching and Retrieval

12

Findings: Relevance Feedback Mediated searching

11% of search terms come from TRF 37% from users, 63% from mediators 2/3 of TRF contributed positively

Page 13: Amanda Spink :  Analysis of Web Searching and Retrieval

13

Findings: Relevance Feedback Identified 6 session states

Initial Query, Modified Query, Next Page, New Query, Relevance Feedback, Prev Query

Identified 4 session patterns Using the 6 session states

Implication: IR designers should accommodate these states and patterns

Page 14: Amanda Spink :  Analysis of Web Searching and Retrieval

14

Findings: Relevance Feedback

Relevance Feedback Session Patterns

Page 15: Amanda Spink :  Analysis of Web Searching and Retrieval

15

Findings: Advanced Search Techniques Includes:

Boolean operators Modifiers +, - Quotes (phrases)

Not often used by Web users, but used more by mediated search Boolean <10%, Modifiers 9%, 6% phrases

Used incorrectly Boolean: AND:50%, OR:28%, AND NOT:19% Modifiers: 75% of time Phrases: 8%

Users and advanced techniques do not get along!

Page 16: Amanda Spink :  Analysis of Web Searching and Retrieval

16

Findings: Advanced Search Techniques Boolean, most common problems:

Not capitalizing AND Confusing ‘AND’ operator with ‘and’ conjunction

e.g. Science and Technology Science AND Technology

Modifiers, most common problems: Prefix rather than mathematical postix

+news +weather rather than news+weather No space required, as is required with Boolean

Page 17: Amanda Spink :  Analysis of Web Searching and Retrieval

17

Findings: Term Characteristics Terms per query

1: 26.6%, 2: 31.5%, 3: 18.2%, >7: 1.8% Mediated searching: 7-15 terms

Distribution of terms not quite Zipf: Top terms account for 10% of all terms Single-use terms account for 9% of all

terms Not understood why this occurs

Page 18: Amanda Spink :  Analysis of Web Searching and Retrieval

18

Findings: Query ClassificationClassification of queries based on Rutgers’ Web Classification

Page 19: Amanda Spink :  Analysis of Web Searching and Retrieval

19

Findings: Query Classification What users are looking for is not what is on

Web: Distribution of content:

83% Commercial, 6% Educational, 3% Health Example: 10% of searches are for Health

Searchers find classifications understandable IR system presentation design

Page 20: Amanda Spink :  Analysis of Web Searching and Retrieval

20

Findings: American & European Searching Commonalities:

Three or fewer terms American: 80%, European 85%

Predominantly use English terms Relevance judgments: less than 15 minutes

viewing retrieved documents Information seeking sessions short

Page 21: Amanda Spink :  Analysis of Web Searching and Retrieval

21

Findings: American & European Searching Differences

Categories American: Entertainment, Sex, Commerce European: People-places-things, Computers, Commerce

American searchers spent more time searching e-commerce sites than European counterparts

Did not examine: Use of advanced techniques Relevance feedback First in initial set of studies?

Page 22: Amanda Spink :  Analysis of Web Searching and Retrieval

22

Findings: Summary

Number of query terms is about 2 TRF is not used often Boolean operators and modifiers not used

often – difficulty in using them correctly Users do not spend much time making

relevancy judgments Term frequency distribution is a few terms

used often, many terms used only once

Page 23: Amanda Spink :  Analysis of Web Searching and Retrieval

23

Findings: Summary

Most users had single query only and did not follow up with successive queries

Average viewing of 2 pages 50% did not access beyond first page; more

than 75% did not go beyond 2 pages

Page 24: Amanda Spink :  Analysis of Web Searching and Retrieval

24

Implications / Further Research Improve use of advanced search techniques

UI changes, Venn Diagrams Improve use of relevance feedback

Automatic generation of TRF results Improve classification of results

UI changes, result overview Improve understanding of language use

Adapt IR designs to language Examine cultural differences

TRF, advanced search techniques (same or different)

Page 25: Amanda Spink :  Analysis of Web Searching and Retrieval

25

Amanda Spink -Web Searching and

Retrieval Questions