Top Banner
Special Topics in Computer Science Special Topics in Computer Science The Art of Information The Art of Information Retrieval Retrieval Chapter 1: Introduction Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com
16

Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

Mar 27, 2015

Download

Documents

Hayden Bolton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

Special Topics in Computer ScienceSpecial Topics in Computer Science

The Art of Information RetrievalThe Art of Information Retrieval

Chapter 1: IntroductionChapter 1: Introduction

Alexander Gelbukh

www.Gelbukh.com

Page 2: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

2

MotivationMotivation

Info: representation, storage, organization, access Search Engines (IR systems) User information need

o Plain English description query

First for libraries, but now — WWW!!! Modern IR:

o modeling

o classification, categorization, filtering

o system architecture

o user interfaces, visualization, query languages

Page 3: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

3

Data vs. Information RetrievalData vs. Information Retrieval

Data Retrieval Precise description Well-structured data

Precise results Yes-or-no results

Science

Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking

Art!

Page 4: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

4

Basic ConceptsBasic Concepts

User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)

Still not very well integrated

o Filtering (user passive, contents active) Logical view of docs

o ... (Added linguistic info)o Full texto Text operations: reduce complexity to index terms

Keywords, stopwords Stemming, noun groups. Linguistic processing!

o Categories

Slow, good

Fast, bad

Page 5: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

5

Past, Present, and FuturePast, Present, and Future

Since clay tabletso Alphabetical index (formal)o Table of Contents (by order)o Classifications (by meaning)

Librarieso Automation of classical techniques. Catalogs.o Search by fields (author, title, keywords)

Web. Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data

Artificial Intelligence and Linguistic methods

Page 6: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

6

Main concernsMain concerns

Open audienceo Help people to formulate their information need

o Improve retrieval quality. Intelligent methods

Efficiency (speed)o Development of fast techniques

Interactiono Watch user behavior to improve quality

o Privacy!

Open contento Legal issues. Copyright. Responsibility for info quality

o Intelligent methods

Page 7: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

7

Retrieval processRetrieval process

Databaseo Define the logical view: text operations, text model

Index (e.g., inverted file)

User queryo Query operations (users are not good at this!)

Retrieved docso Ranked by likelihood (relevance)

Feedback cycle

Page 8: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .
Page 9: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

9

The BookThe Book

Topicso Text IR

o Interfaces

o Multimedia IR

o Applications

We will not consider:o Parallel and Distributed IR

o Multimedia IR: Models and Languages

o Multimedia IR: Indexing and Searching

Page 10: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .
Page 11: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

11

Chapters: Text IRChapters: Text IR

Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation

Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations

Efficiencyo Indexing and Searching

Page 12: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

12

Chapters: Interfaces, ApplicationsChapters: Interfaces, Applications

Interfaceso User Interfaces and Visualization

Applicationso Searching the Web o Libraries and Bibliographical Systemso Digital Libraries

Page 13: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

13

Book’s web pageBook’s web page

sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more

Korean version is NOT recommended.

Read in English!

Page 14: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

14

ConferencesConferences

General conferences on text processingo ACL

o COLING

o CICLing

o DEXA (databases)

o NLDB

Confs on IRo ACM SIGIR

o TREC

o SPIRE

Page 15: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

15

ConclusionsConclusions

User Information Needo Vague

o Semantic, not formal

Document Relevanceo Order, not retrieve

Huge amount of informationo Efficiency concerns

o Tradeoffs

Art more than science

Page 16: Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh .

16

Thank you!

Till September 25