Special Topics in Computer Science Special Topics in Computer Science The Art of Information The Art of Information Retrieval Retrieval Chapter 1: Introduction Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com
Mar 27, 2015
Special Topics in Computer ScienceSpecial Topics in Computer Science
The Art of Information RetrievalThe Art of Information Retrieval
Chapter 1: IntroductionChapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
2
MotivationMotivation
Info: representation, storage, organization, access Search Engines (IR systems) User information need
o Plain English description query
First for libraries, but now — WWW!!! Modern IR:
o modeling
o classification, categorization, filtering
o system architecture
o user interfaces, visualization, query languages
3
Data vs. Information RetrievalData vs. Information Retrieval
Data Retrieval Precise description Well-structured data
Precise results Yes-or-no results
Science
Information Retrieval Vague information need Natural Language, images, ... Semantic interpretation Approximate results Relevance ranking
Art!
4
Basic ConceptsBasic Concepts
User task (search)o Can formulate what they need: Retrieval (classical)o Can’t (or does not know): Browsing (new to IR)
Still not very well integrated
o Filtering (user passive, contents active) Logical view of docs
o ... (Added linguistic info)o Full texto Text operations: reduce complexity to index terms
Keywords, stopwords Stemming, noun groups. Linguistic processing!
o Categories
Slow, good
Fast, bad
5
Past, Present, and FuturePast, Present, and Future
Since clay tabletso Alphabetical index (formal)o Table of Contents (by order)o Classifications (by meaning)
Librarieso Automation of classical techniques. Catalogs.o Search by fields (author, title, keywords)
Web. Digital Libraries: interactiveo Cheaper huge amount of datao Networks remote access, wider audienceo Free publishing unprepared, heterogeneous data
Artificial Intelligence and Linguistic methods
6
Main concernsMain concerns
Open audienceo Help people to formulate their information need
o Improve retrieval quality. Intelligent methods
Efficiency (speed)o Development of fast techniques
Interactiono Watch user behavior to improve quality
o Privacy!
Open contento Legal issues. Copyright. Responsibility for info quality
o Intelligent methods
7
Retrieval processRetrieval process
Databaseo Define the logical view: text operations, text model
Index (e.g., inverted file)
User queryo Query operations (users are not good at this!)
Retrieved docso Ranked by likelihood (relevance)
Feedback cycle
9
The BookThe Book
Topicso Text IR
o Interfaces
o Multimedia IR
o Applications
We will not consider:o Parallel and Distributed IR
o Multimedia IR: Models and Languages
o Multimedia IR: Indexing and Searching
11
Chapters: Text IRChapters: Text IR
Models and Evaluationo Modeling (basic concepts)o Retrieval Evaluation
Improvements on Retrievalo Query Languageso Query Operations o Text Languages and Properties o Text Operations
Efficiencyo Indexing and Searching
12
Chapters: Interfaces, ApplicationsChapters: Interfaces, Applications
Interfaceso User Interfaces and Visualization
Applicationso Searching the Web o Libraries and Bibliographical Systemso Digital Libraries
13
Book’s web pageBook’s web page
sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more
Korean version is NOT recommended.
Read in English!
14
ConferencesConferences
General conferences on text processingo ACL
o COLING
o CICLing
o DEXA (databases)
o NLDB
Confs on IRo ACM SIGIR
o TREC
o SPIRE
15
ConclusionsConclusions
User Information Needo Vague
o Semantic, not formal
Document Relevanceo Order, not retrieve
Huge amount of informationo Efficiency concerns
o Tradeoffs
Art more than science
16
Thank you!
Till September 25