Top Banner
How Do We Find Information?
13

How Do We Find Information?. Key Questions What are we looking for? How do we find it? Why is it difficult? “A prudent question is one-half of wisdom”

Dec 14, 2015

Download

Documents

Letitia Terry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

How Do We Find Information?

Page 2: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Key Questions

What are we looking for? How do we find it? Why is it difficult?

“A prudent question is one-half of wisdom” Francis Bacon

Search Engines 2

Page 3: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

What are we looking for? We are

Looking for X. Q&A: population of China Known-item Search: “Cather in the Rye”

Looking for something like/about X. General/background info: Taliban Collection Development: IR Literature Similar to (known) X: like “Cather in the Rye” WhatyoumacallX: “the rye-boy story”

Looking for something Problem Resoultion: how can we fight terrorism? Knowledge Development: what is IR?

Looking Need something, but don’t know what

– what’s it all about? Serendipity: Web surfing

Search Engines 3

Page 4: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

How do we find it? Brute force search

Easy to build, maintain, and use Searcher does all the work; Hard to get satisfaction

Organize/structure the data (Information Organization) Intuitive to use Hard to build and maintain Knowledge of builder’s language & organization structure is crucial

Use a search tool (Information Retrieval) Easier to build and maintain: Less manipulation of data Sometimes works, sometimes not (Helps to know the language of the data)

Ask the experts (Expert System) Easy and satisfying to use (by definition) “Expert” knowledge is transitory, hard to encapsulate

Go with the crowd (User Ratings > Recommender System > PageRank) Relatively easy to build and maintain Limited utility: doesn’t work with “unpopular” X

Zen-Fusion search.

Search Engines 4

Page 5: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Information Seeking Process: Dynamic, Interactive, Iterative

User Intermediary Information

What am I looking for? - Identification of info. needHow do I find it? - Query formulation

What are we looking for? - Discovery of user’s information need - Query representationWhere is it? - Query-document matching

What is it? - Collection - ClassificationHow is it found? - Data structure - Representation

5Search Engines

Page 6: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

IR vs. IO

Information Organization: - Add structure & annotation

Information Retrieval- Create a searchable index

Information Access- Retrieve information

Data Mining- Discover Knowledge

6Search Engines

Page 7: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Information Retrieval

Representation- indexing, term weighting

Searchable Index Raw Data

Query Formulation- “What is IR?”

Search Results- (ranked) document list

D1 wd1 wd2 wd3

D2 wd2 wd4 wd2 wd3

D3 wd1 wd4

D1 D2 D3

wd1 1 0 1

wd2 1 2 0

wd3 1 1 0

wd4 0 1 1

1 D2

2 D1

3 D3

7Search Engines

Page 8: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Information Organization

Representation- NLP & Machine Learning

Organized Data Raw Data

Query Formulation- “What is IR?”

Search Results- document groups

8Search Engines

Page 9: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Natural Language Processing (NLP)

Research Area, technique, tool for Knowledge Discovery, Data Mining

Lexical Analysis using Part-of-Speech (POS) tagging Sentence Parsing

9Search Engines

Page 10: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Machine Learning

Research Area, technique, tool for Information Organization, Knowledge Discovery, Data Mining

Information Organization via Supervised Learning (Automatic Classification) Unsupervised Learning (Clustering)

Class 1

Class 2

Class 1

Class 2Classification

Clustering

10Search Engines

Page 11: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Clustering Document Clustering

Cluster Hypothesis– Documents having similar contents tend to be relevant to the same query

Rank clusters by Query-Cluster Similarity– Cluster documents based on vector similarity

Post-retrieval clustering– Scatter-Gather

Keyword Clustering Automatic Thesaurus Construction

– Query Expansion

IO for IR

11 Search Engine

Page 12: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Classification Document Categorization

classify documents into manually defined categories– supports hierarchical browsing, query expansion via relevance feedback

Document Indexing assign keywords to documents

– automatic indexing with controlled vocabulary, metadata generation

Document Filtering e.g. news delivery, email spam filtering

Query Classification collection selection algorithm selection

IO for IR

12 Search Engine

Page 13: How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

Search Engines 13