Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.

Post on 29-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Why Can’t I Just

Use Google?Uncovering the Invisible Web

Back in the day…Students used to research using resources hand-picked by librarians and teachers. These materials were selected because:

They were written by expert authorsThey presented information without

biasThey were written at your reading

levels

Now, with access to the Web, you have to learn to find and evaluate resources too!

Today you will learn:How much information can search engines

find on the web?

What kind of information can you not typically find through a search engine?

How does information in a library database compare to information found through a Google search?

Can your favorite search engine

find all there is to find on the Web?

It finds all we need to find, right guys?

Think again.Search engines access a relatively small part of the Web, known as

“The Free Web”

The large part of the Web that search engines can’t access is known as

“The Invisible Web” or “The Deep Web”

• Subscription databases

• Archives• E-Books

Estimated to be 400-550 times size of the visible Web!

What else is not on the Free Web?

Print books still under the protection of copyright. This means around 90% of the books on our library’s shelves!

Full-text, searchable archives of journal, magazine, and newspaper articles

The “Invisible web” is information you cannot retrieve from search results.

Information from this part of the web is not “crawled” by a search engine and is thus “invisible” to a searcher

who does not know it exists.

What’s the Invisible Web?

Search engines “crawl” the web like a spider, indexing some of the information found on a web page.

Indexing is the process of recording that information along with its location so that people can search for it and find it in a database.

A database is really just a collection of information that is indexed or organized for people to search.

Spiders crawl the web?

How do search engines find web pages?

Search engines find some sites through links on existing web pages, and other sites are submitted directly to the search engine by the person who created the website.

It can take 2-6 months for a search engine to crawl a new web site.

Each search engine crawls the web and sorts the information found in a different way.

Why search engines can’t access every page on the

Web:The material is in a database that you have

to pay to access (library databases)

The page is available only after registration

A security firewall prevents access to the page

The information is private and owned by a company or organization

Some more reasons…The page is available by some search engines

but not others. No two search engines are the same.

The search engine crawler does not search a particular file format or non-text interface (Flash files)

The site may be new, and hasn’t been crawled yet.

Some pages just aren’t linked. You must know

the URL.

Smarter search engines

The larger search engines like Google already index information contained within media such as PDF files, word processor documents, spreadsheets, etc.

But some information will never be available for free on the

Internet.

Why can’t we get this stuff

for free?

Publishing is a business!Professional authors are in the business of

earning money for their hard work.

Publishers also expect to earn money from publishing and selling an author’s work.

Copyright laws protect the work of authors, artists, and anyone who creates information or other media.

Your teachers expect quality!

Because material from library books and subscription databases is written by experts,

your teachers may prefer you to use these resources instead of searching the Free Web.

You may even have to defend some of the

sources you find on the free Web!

Let’s try searching for information in Google and in a

library database.

You be the judge!

top related