Top Banner
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web • The mechanics of a typical search. • Search engines as information gatekeepers. • The search engine wars. • Statistics from search engine logs. • The architecture of a search engine. • The search index. • The query engine. • Crawling the web.
20

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mar 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.1

Chapter 4 : Searching the Web

• The mechanics of a typical search.• Search engines as information gatekeepers.• The search engine wars.• Statistics from search engine logs.• The architecture of a search engine.• The search index.• The query engine.• Crawling the web.

Page 2: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.2

Mechanics of a Typical Search

Figure 4.1 : Query submitted to Google

Page 3: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.3

Mechanics of a Typical Search

Figure 4.2 : Google results for the query

Page 4: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.4

Mechanics of a Typical Search

Figure 4.3: Category of first result

Page 5: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.5

Mechanics of a Typical Search

Figure 4.4 : Result for phrase query

Page 6: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.6

Search Engines as Information Gatekeepers

• Search engines are becoming the primary entry point for discovering web pages.

• Ranking of web pages influences which pages users will view.

• Exclusion of a site from search engines will cut off the site from its intended audience.

• The privacy policy of a search engine is important.

Page 7: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.7

Search Engine Wars

• The battle for domination of the web search space is heating up!

• The competition is good news for users!

• The way in which advertising is combined with search results is crucial!

• There are serious implications if one of the search engines will manage to dominate the space!

Page 8: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.8

Google

• Verb “google” has become synonymous with searching for information on the web.

• Has raised the bar on search quality,

• Has been the most popular search engine in the last few years.

• Had a very successful IPO in August 2004.

• Is innovative and dynamic.

Page 9: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.9

Yahoo!

• Synonymous with the dot-com boom, probably the best known brand on the web.

• Started off as a web directory service.

• Has very strong advertising and e-commerce partnerships.

• Acquired leading search engine technology in 2003.

Page 10: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.10

MSN Search

• Synonymous with PC software.

• Remember its victory in the browser wars with Netscape.

• Developed its own search engine technology only recently, officially launched in Feb. 2005.

• May link web search into its next version of Windows.

Page 11: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.11

Others

• Ask Jeeves– Specialises in natural language question

answering.– Search driven by Teoma.

• Looksmart– Has its own directory service.– Search driven by Wisenut.

• …

Page 12: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.12

Statistics from search engine logs

Statistic

(Year)

AltaVista

(1998)

AlltheWeb

(2002)

Excite

(2001)

average terms per query

2.35 2.30 2.60

average queries per session

2.02 2.80 2.30

average result pages viewed

1.39 1.55 1.70

usage of advanced search features

20.4% 1.0% 10.0%

Page 13: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.13

Experiment with search engine query syntax

• Default is AND, e.g. “computer chess” normally interpreted as “computer AND chess”, i.e. both keywords must be present in all hits.

• “+chess” in a query means the user insists that “chess” be present in all hits.

• “computer OR chess” means either keywords must be present in all hits.

• “”computer chess”” means that the phrase “computer chess” must be present in all hits.

Page 14: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.14

The most popular search keywords

AltaVista (1998) AlltheWeb (2002) Excite (2001)

sex free free

applet sex sex

porno download pictures

mp3 software new

chat uk nude

Page 15: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.15

Architecture of a Search Engine

Page 16: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.16

Search Index - Inverted File

• Also store position of word in web page and information on HTML structure.

Page 17: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.17

The query engine

• The interface between the search index, the user and the web.

• Algorithmic details of commercial search engines kept as trade secrets.

• First step is retrieval of potential results from the index.

• Second step is the ranking of the results based on their “relevance” to the query.

Page 18: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.18

Portal User Interface(See also yahoo.com)

Page 19: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.19

Crawling the Web

Page 20: Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005

Slide 4.20

Delivering a global search service

• See: Web Search for a Planet: The Google Cluster Architecture (IEEE Micro, 2003).