How Do We Find Information?. Key Questions What are we looking for? How do we find it? Why is it difficult? “A prudent question is one-half of wisdom”

Post on 14-Dec-2015

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

How Do We Find Information?

Key Questions

What are we looking for? How do we find it? Why is it difficult?

“A prudent question is one-half of wisdom” Francis Bacon

Search Engines 2

What are we looking for? We are

Looking for X. Q&A: population of China Known-item Search: “Cather in the Rye”

Looking for something like/about X. General/background info: Taliban Collection Development: IR Literature Similar to (known) X: like “Cather in the Rye” WhatyoumacallX: “the rye-boy story”

Looking for something Problem Resoultion: how can we fight terrorism? Knowledge Development: what is IR?

Looking Need something, but don’t know what

– what’s it all about? Serendipity: Web surfing

Search Engines 3

How do we find it? Brute force search

Easy to build, maintain, and use Searcher does all the work; Hard to get satisfaction

Organize/structure the data (Information Organization) Intuitive to use Hard to build and maintain Knowledge of builder’s language & organization structure is crucial

Use a search tool (Information Retrieval) Easier to build and maintain: Less manipulation of data Sometimes works, sometimes not (Helps to know the language of the data)

Ask the experts (Expert System) Easy and satisfying to use (by definition) “Expert” knowledge is transitory, hard to encapsulate

Go with the crowd (User Ratings > Recommender System > PageRank) Relatively easy to build and maintain Limited utility: doesn’t work with “unpopular” X

Zen-Fusion search.

Search Engines 4

Information Seeking Process: Dynamic, Interactive, Iterative

User Intermediary Information

What am I looking for? - Identification of info. needHow do I find it? - Query formulation

What are we looking for? - Discovery of user’s information need - Query representationWhere is it? - Query-document matching

What is it? - Collection - ClassificationHow is it found? - Data structure - Representation

5Search Engines

IR vs. IO

Information Organization: - Add structure & annotation

Information Retrieval- Create a searchable index

Information Access- Retrieve information

Data Mining- Discover Knowledge

6Search Engines

Information Retrieval

Representation- indexing, term weighting

Searchable Index Raw Data

Query Formulation- “What is IR?”

Search Results- (ranked) document list

D1 wd1 wd2 wd3

D2 wd2 wd4 wd2 wd3

D3 wd1 wd4

D1 D2 D3

wd1 1 0 1

wd2 1 2 0

wd3 1 1 0

wd4 0 1 1

1 D2

2 D1

3 D3

7Search Engines

Information Organization

Representation- NLP & Machine Learning

Organized Data Raw Data

Query Formulation- “What is IR?”

Search Results- document groups

8Search Engines

Natural Language Processing (NLP)

Research Area, technique, tool for Knowledge Discovery, Data Mining

Lexical Analysis using Part-of-Speech (POS) tagging Sentence Parsing

9Search Engines

Machine Learning

Research Area, technique, tool for Information Organization, Knowledge Discovery, Data Mining

Information Organization via Supervised Learning (Automatic Classification) Unsupervised Learning (Clustering)

Class 1

Class 2

Class 1

Class 2Classification

Clustering

10Search Engines

Clustering Document Clustering

Cluster Hypothesis– Documents having similar contents tend to be relevant to the same query

Rank clusters by Query-Cluster Similarity– Cluster documents based on vector similarity

Post-retrieval clustering– Scatter-Gather

Keyword Clustering Automatic Thesaurus Construction

– Query Expansion

IO for IR

11 Search Engine

Classification Document Categorization

classify documents into manually defined categories– supports hierarchical browsing, query expansion via relevance feedback

Document Indexing assign keywords to documents

– automatic indexing with controlled vocabulary, metadata generation

Document Filtering e.g. news delivery, email spam filtering

Query Classification collection selection algorithm selection

IO for IR

12 Search Engine

Search Engines 13

top related