Top Banner
© Tefko Saracevic, Rutgers Universit y 1 1. Discussion 2. Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement Feb. 3, 2003
29

© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

1

1. Discussion

2. Information retrieval (IR) model (the traditional models).

3. The review of the readings.

Announcement Feb. 3, 2003

Page 2: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

2

Information retrieval (IR):traditional model

Definition of IR

System & user components

Exact match & best match searches

Strengths & weaknesses of the two match models

Page 3: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

3

IR: problems addressed - original definition

Calvin Mooers first introduced this term, “information retrieval”, into the literature of documentation in 1950. (Swanson, 1988)

“Inf. retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, techniques, or machines are employed to carry out the operation.”

Calvin Mooers, 1951

Page 4: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

4

IR: another definition

• “Information retrieval is often regarded as being synonymous with document retrieval and nowadays, with text retrieval, implying that the task of an IR system is to retrieve documents or texts with information content that is relevant to a user’s information need” (Spark Jones & Willett, 1997)

Page 5: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

5

IR:Objective & problems

Provide the users with effective access to & interaction with information resources.

Problems addressed:

1. How to organize information intellectually?

2. How to specify search & interaction intellectually?

3. What systems & techniques to use for those processes?

Page 6: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

6

IR models

• Model depicts, represents what is involved - a choice of features, processes, things for consideration

• Several IR models used over time– traditional: oldest, most used, shows

basic elements involved– interactive: more realistic, favored now,

shows also interactions involved; several models proposed

• Each has strengths, weaknesses

• We start with traditional model to illustrate many points - from general to specific examples

Page 7: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

7

Traditional IR model

• The classic information retrieval model (Bates, 1989)

  Document

Document representation

Query

Informationneed

Match

Page 8: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

8

Traditional IR model

• The “standard” IR model (Belkin, 1993)

Information need

Representation

Query

Texts

Representation

Surrogate

Comparison

Retrieval Texts

Judgment

Modification

Page 9: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

9

File organizationindexed documents

Acquisitiondocuments, objects

Representationindexing, ...

Probleminformation need

Representationquestion

Querysearch formulation

Matchingsearching

Retrieved objects

feedba

ck

Traditional IR model

System User

Page 10: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

10

A few question about the traditional models

• 1. What is the similarity and difference between these three models?

• 2. What do you learn about IR from them?

• 3. What is the weaknesses and strengths of traditional IR model? If possible, critique these models combining your own experience.

Page 11: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

11

• Content: What is in databases– In DIALOG first part of blue sheets:

File Description, Subject Coverage

• Selection of documents & other objects from various sources– In blue sheets: Sources

• Mostly text based documents– Full texts, titles, abstracts ...– But also: data, statistics, images

(e.g. maps, trade marks) ...

Acquisition(system)

Importance:Determines contents of databases Key to file

selection !!!

Page 12: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

12

• Indexing :– controlled vocabulary - thesaurus– free text terms (even in full texts)

• Abstracting; annotating

• Bibliographic description:– author, title, source, date…metadata

• Classifying, clustering, ranking– Basic Index, Additional Index. Limits

• Organization in fields & limits

• Manual & automatic techniques– advantages & disadvantages

Representationof documents, objects

(system)

Basic to what is available for searching & displaying

Page 13: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

13

• Sequential – record (document) by record

• Inverted – term by term; list of records under

each term

• Combination: indexes inverted, documents sequential

• When citation retrieved only, need for document files

• Large file approaches– for efficient retrieval by computers

File organization(system)

Enables searching & interplay

Page 14: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

14

• Related to task situation at hand

• Vary in specificity, clarity

• Produces information need

• Ultimate criterion for effectiveness of retrieval

• Inf. need for the same problem may change, evolve, shift during the IR process - adjustment in searching

• Often more than one search for same problem over time

Problem(user)

Critical for examination in interview

Page 15: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

15

• A question:

• Why information need for the same problem may change? Do you have this experience? Tell us your story.

Problem(user)

Page 16: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

16

• Non-mediated: end user alone

• Mediated: intermediary + user– interviews; human-human interaction

• Question analysis: selection, elaboration of terms

• Focus toward search terms & logic; selection of databases

• Subject to feedback changes

• Various tools: thesaurus ...

• Roles of intermediary

Representation - question( user & possibly system)

Determines contentsof searching - dynamic

Page 17: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

17

• Translation into systems requirements & limits – start of human-computer interaction

• Selection of databases

• Search strategy - selection of:– search terms & logic– possible fields, delimiters – controlled & uncontrolled vocabulary– variations in effectiveness tactics

• Reiterations from feedback – several feedback types: relevance

feedback, magnitude feedback ...– query expansion & modification

Query - search statement(user & system)

What & how of actual searching

Page 18: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

18

• Process of matching, comparing– search: what documents in the file

match the query as stated?

• Various search algorithms:– exact match - Boolean

• still most prevalent

– best match - ranking by relevance

• increasingly used e.g. on the web

– hybrids incorporating both

• e.g. Target, Rank in DIALOG

• Each has strengths, weaknesses– no ‘perfect’ method exists

Matching - searching(user & system)

Search interactions

Page 19: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

19

• Various order of output:– Last In First Out (LIFO); sorted– ranked by relevance– ranked by other characteristics

• Various forms of output– In DIALOG: Output options

• When citations only: linkage to document delivery

• Base for relevance, utility evaluation by users

• Relevance feedback

Retrieved documents(from system to user)

What a user sees, gets, judges

Page 20: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

20

Exact match - Boolean search

• You retrieve exactly what you ask for in the query:– all documents that have the term(s)

with logical connection(s), and possible other restrictions (e.g. to be in titles) as stated in the query

– exactly: nothing less, nothing more

• Based on matching following rules of Boolean algebra, or algebra of sets– ‘new algebra’

– presented by circles in Venn diagrams

Page 21: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

21

Boolean algebra & Venn diagrams

Four basic operations:

1 2 3

A BA alone. All documents that have A. Shade 1 & 2. E.G. apples

1 2 3

A B

A AND B. Shade 2

apples AND oranges

1 2 3

A B

A OR B. Shade 1, 2, 3

apples OR oranges

1 2 3

A B

A NOT B. Shade 1

apples NOT oranges

Page 22: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

22

Venn diagrams … cont.

Complex statements allowed e.g

12

3

4 5 6

7

A B

C

(A OR B) AND C

Shade 4,5,6

(apples or oranges) AND Florida

(A OR B) NOT C

Shade what?

(apples or oranges NOT Florida

Page 23: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

23

Venn diagrams cont.

• Complex statements can be made– as in ordinary algebra e.g. (2+3)x4

• As in ordinary algebra: watch for parenthesis:– 2+(3 x 4) is not the same as

(2+3)x4– (A AND B) OR C not the same as

A AND (B OR C)

Page 24: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

24

Best match searching

• You retrieve documents ranked by how similar (close) they are to a query (as calculated by the system)– similarity assumed as relevance– thus, documents as answers are presented from

those that are most likely relevant downwards to less & less likely relevant - can be cut at any desired number - e.g. first 10

• Algorithms (formulas) used to determine similarity– using statistic &/or linguistic properties

• Web outputs are mostly ranked

• But DIALOG allows ranking as well, with special commands

Page 25: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

25

Best match ... cont.

• Best match process:– compares a set of query terms with the

sets of terms in documents– calculates a similarity between query &

each document based on common terms– sorts the documents in order of similarity– assumes that the higher ranked

documents have a higher probability of being relevant

– allows for cut-off at a chosen number

• BIG issue: What representation & similarity measures are best?– considerable research & many tests– many proprietary algorithms

Page 26: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

26

Boolean vs. best match

• Boolean– allows for logic– provides all that

has been matched

BUT– has no particular

order of output– treats all

retrievals equally - from the most to least relevant ones

– often requires examination of large outputs

• Best match– allows for free

terminology– provides for a

ranked output– provides for cut-

off - any size output

BUT– does not include

logic– ranking method

(algorithm) not transparent

• whose relevance?

– where to cut off?

Page 27: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

27

Boolean vs. best match

• Questions about best match (just thinking).

• 1. If you are a user, do you believe the judgment of algorithm if you do not read the hits?

• 2. Is it definitely that a document which is judged only 10% relevant to your query is less useful for resolving your information problem than a 40% relevant one?

Page 28: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

28

Strengths of traditional IR model

• Lists major components in both system & user branches

• Suggests:– What to explain to users about

system, if needed– What to ask of users for more effective

searching (problem ...)

• Selection of component(s) for concentration– mostly ever better representation

• Provides a framework for evaluation of (static) aspects

Page 29: © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

© Tefko Saracevic, Rutgers University

29

Weaknesses

• Does not address nor account for interaction & judgment of results by users– identifies interaction with search only– interaction is a much richer process

• Many types of & variables in interaction not reflected

• Feedback has many types & functions - also not shown

• Evaluation thus one-sided

IR is a highly interactive process- thus additional model(s) needed