Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

Post on 27-Mar-2015

224 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

Transcript

Special Topics in Computer ScienceSpecial Topics in Computer Science

The Art of Information RetrievalThe Art of Information Retrieval

Chapter 1: IntroductionChapter 1: Introduction

Alexander Gelbukh

www.Gelbukh.com

2

MotivationMotivation

Info: representation, storage, organization, access Search Engines (IR systems) User information need

o Plain English description query

First for libraries, but now — WWW!!! Modern IR:

o modeling

o classification, categorization, filtering

o system architecture

o user interfaces, visualization, query languages

3

Data vs. Information RetrievalData vs. Information Retrieval

Data Retrieval Precise description Well-structured data

Precise results Yes-or-no results

Science

Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking

Art!

4

Basic ConceptsBasic Concepts

User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)

Still not very well integrated

o Filtering (user passive, contents active) Logical view of docs

o ... (Added linguistic info)o Full texto Text operations: reduce complexity to index terms

Keywords, stopwords Stemming, noun groups. Linguistic processing!

o Categories

Slow, good

Fast, bad

5

Past, Present, and FuturePast, Present, and Future

Since clay tabletso Alphabetical index (formal)o Table of Contents (by order)o Classifications (by meaning)

Librarieso Automation of classical techniques. Catalogs.o Search by fields (author, title, keywords)

Web. Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data

Artificial Intelligence and Linguistic methods

6

Main concernsMain concerns

Open audienceo Help people to formulate their information need

o Improve retrieval quality. Intelligent methods

Efficiency (speed)o Development of fast techniques

Interactiono Watch user behavior to improve quality

o Privacy!

Open contento Legal issues. Copyright. Responsibility for info quality

o Intelligent methods

7

Retrieval processRetrieval process

Databaseo Define the logical view: text operations, text model

Index (e.g., inverted file)

User queryo Query operations (users are not good at this!)

Retrieved docso Ranked by likelihood (relevance)

Feedback cycle

9

The BookThe Book

Topicso Text IR

o Interfaces

o Multimedia IR

o Applications

We will not consider:o Parallel and Distributed IR

o Multimedia IR: Models and Languages

o Multimedia IR: Indexing and Searching

11

Chapters: Text IRChapters: Text IR

Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation

Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations

Efficiencyo Indexing and Searching

12

Chapters: Interfaces, ApplicationsChapters: Interfaces, Applications

Interfaceso User Interfaces and Visualization

Applicationso Searching the Web o Libraries and Bibliographical Systemso Digital Libraries

13

Book’s web pageBook’s web page

sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more

Korean version is NOT recommended.

Read in English!

14

ConferencesConferences

General conferences on text processingo ACL

o COLING

o CICLing

o DEXA (databases)

o NLDB

Confs on IRo ACM SIGIR

o TREC

o SPIRE

15

ConclusionsConclusions

User Information Needo Vague

o Semantic, not formal

Document Relevanceo Order, not retrieve

Huge amount of informationo Efficiency concerns

o Tradeoffs

Art more than science

16

Thank you!

Till September 25

top related