Top Banner
Intelligent Meta-Search and Clustering Technology http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer Scientist National Library of Medicine [email protected]
15

Characteristics of Web Searching

Feb 22, 2016

Download

Documents

terah

Intelligent Meta-Search and Clustering Technology http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov Tamas Doszkocs, Ph.D. Computer Scientist National Library of Medicine [email protected]. Characteristics of Web Searching. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Characteristics of Web Searching

Intelligent Meta-Search and Clustering Technology

http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov

Tamas Doszkocs, Ph.D.Computer Scientist

National Library of Medicine [email protected]

Page 2: Characteristics of Web Searching

Characteristics of Web Searching• Content is created by diverse

organizations and individuals

• Information on the Web is inherently heterogeneous

• Content is distributed on multiple servers in multiple locations and multiple formats and languages aimed for diverse audiences and purposes

(In its April 2005 survey NetCraft received responses from 62,286,451 web sites)

• The “Open Web” of billions of static Web pages is indexed and searched via multiple search engines and directories

Page 3: Characteristics of Web Searching

Problems in Web Searching• Even the largest of the current search engines

index only a fraction of all Web pages (The WayBackMacine of Internet Archive has indexed 40 billion pages, Google about 8.1 billion, Yahoo

about 20.8 billion -- August 2005)

• The not so “Hidden Web” of content databases (e.g. PubMed, Web of Science) is estimated to be thousands of times larger than the Open Web.

• Both the Open Web and the Hidden Web are characterized by problems of information coverage, quality, overload, relevancy, currency and completeness, as well as inherent language ambiguity and incompatible user interfaces

Page 4: Characteristics of Web Searching

Meta-Searching

• Meta-Search Engines may simultaneously search multiple Open Web and Hidden Web sites in order to increase content coverage, precision, relevance and/or search efficiency and effectiveness.

Page 5: Characteristics of Web Searching

Overlap Among 3 Major Search Engineshttp://missingpieces.dogpile.com/whitepaper.pdf

http://comparesearchengines.dogpile.com/OverlapAnalysis.pdf

Page 6: Characteristics of Web Searching

Overlap Among AskJeeves, Google, MSN and YahooGoogle Isn’t Everything!

http://www.forbes.com/business/free_forbes/2005/0815/056.html?partner=yahoomag

Page 7: Characteristics of Web Searching

Generations of Meta-Search Engines

• First Generation

• Second Generation

• Third Generation

• Next Generation

• “Broadcast” or “Federated” search– List of results

• Merging and Ranking– Increased coverage

• Result Clustering– Focused drill-down– Dynamic Query Mods

• Semantic and Pragmatic Intelligence

– tamas.nlm.nih.gov/metasearch/– toxseek.nlm.nih.gov– http://bestmeta.com

Page 8: Characteristics of Web Searching

Moving Targets:Nine Search Engines Compared

By Ben Patterson (May 9, 2005)

http://reviews.cnet.com/4520-10572_7-6219242-2.html?tag=txt

Page 9: Characteristics of Web Searching

Moving Targetsand the need for

Automatic Change Detection and Monitoringand

Integrating New Capabilities

Page 10: Characteristics of Web Searching

The ToxSeek Meta-Search and ClusteringProject

• Goals:– Integrate best practices Information Retrieval and

Natural Language Processing techniques with AI heuristics to create an advanced general purpose meta-search, result clustering and knowledge discovery tool

– Apply ToxSeek to efficiently access diverse biomedical and environmental health information resources

– Create specialized applications for accessing quality information sources on HIV/AIDS, consumer health, homeland security, public health law, library research and other applications

Page 11: Characteristics of Web Searching

ToxSeek Features• Integrates multiple spellcheckers and sophisticated lexical,

morphologic, syntactic and semantic resources • Merges and ranks the results from heterogeneous

information sources • Employs efficient Natural Language Phrase Parser and AI

heuristics to automatically identify Key Concepts and their Associations in queries and retrieved documents

• Uses the automatically identified Key Concepts and Associations to create topical Result Clusters

• Supports focused multi-concept drill-down, dynamic query refinement, multi-media and limited question answering

Page 12: Characteristics of Web Searching

ToxSeek Implementation• Production applications and research prototypes have

been implemented for meta-searching diverse content on:– Toxicology and Environmental Health– Consumer Health– Library Catalogs and Proprietary Databases– HIV/AIDS– BioDefense– Homeland Security

• “Shift Happens…”– http://library.nps.navy.mil/home/staff/gmarlatt/HSDL%20ALI%20April

%202005%20%20final%20rev%207%20april.ppt

Page 13: Characteristics of Web Searching

ToxSeek Web Search Query: “terrorism”

Page 14: Characteristics of Web Searching

ToxSeek Query: “police state”

Page 15: Characteristics of Web Searching

Win the Search Engine Wars with Intelligent Meta-Search and Clustering Technology

http://tamas.nlm.nih.gov/metasearch/ http://toxseek.nlm.nih.gov

Tamas Doszkocs, Ph.D.Computer Scientist

National Library of Medicine [email protected]