Top Banner
1 Information retrieval [email protected] Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures presented in universities in China, March 2003. These slides are available from http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentation s/
274

1 Information retrieval [email protected] Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

Dec 18, 2015

Download

Documents

Angelica Joseph
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

1Information retrieval

[email protected]

• Vrije Universiteit Brussel

• Information- and Library Science, University of Antwerp(en),

Belgium

Lectures presented in universities in China, March 2003.

These slides are available fromhttp://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/

Page 2: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

2

Contents / summary

of this presentation

1. About “information”

2. Databases and computerized information retrieval

3. Classifications, and thesaurus systems

4. Internet

5. World-Wide Web

6. Online access information sources and services!

Page 3: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

3

About “information”

Information concepts

****

Page 4: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

4

Our world: future trends

Future trends in our world

• Complexity

• Dynamics and evolution Speed and acceleration

• Internationalization Globalization

• Economic products less based on natural resources and more on “knowledge”

Answers / Requirements / Solutions / Reactions

• Knowledge and skills

• Adaptability Flexibility

• Global co-operation Mobility

• Education, research, exploitation of knowledge is important

***-

Page 5: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

!? Question !?

Compare “information” for instance with “bananas”.

Compare “information” for instance with “bananas”.

***- 5

Page 6: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

6

Information versus other products = bits versus atoms

• The essential difference between information and other economical products or natural products is that information on computers (such as databases) consists of bits (and bytes), while other economic / natural products (such as bananas) consist of atoms.

• This has many interrelated consequences.

***-

01010101101011010010

Page 7: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

7

Information: some strange properties (Part 1)

• Information is never consumed and does not deteriorate. However, nevertheless information becomes obsolete; speed of delivery can be crucial. The context is important.

• There is no agreed measure of a unit of information.

• The price of an information item is not well linked to its value in a particular situation. Moreover, one cannot well quantify the benefit/value of information.

***-

Page 8: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

8

Information: some strange properties (Part 2)

• One information item can be available to different persons at the same time. Information can be well reproduced, which makes it cheap for wide consumption. However, copyright can keep the price high.

• Most digital information items (documents) can be changed, modified, falsified, manipulated… easier than physical products/items.”Is this document real, authentic, original?”

***-

Page 9: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

9***-

Information sources:people and documents

• Information sources come essentially in two formats:

» less formal: people communicating by

—telephone

—electronic mail,…

»more formal: documents such as

—hard copy documents

—electronic, digital documents; computer-based files

• Here we focus mainly on information that is stored in documents.

Page 10: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

10

The flow of documentary information with primary and secondary sources

Reader /User /

Receiver

Reader /User /

Receiver

Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)

Library catalogues, including OPACs...

Secondary sources / systems: mainlyReference works (printed, CD-ROM, online)

Library catalogues, including OPACs...

****

Author /Creator / Sender

Author /Creator / Sender Primary sources / systems: mainly

Journal articles / Books / Electronic mail / Online sources /...

Primary sources / systems: mainlyJournal articles / Books /

Electronic mail / Online sources /...

Page 11: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

11

The role of secondary information sources

• The secondary information flow is generated on the basis of the primary flow, mainly because the great amounts of primary information lower the chance to retrieve and use the appropriate information item.

• Secondary information tries to bring some order in the great chaos.

****

Page 12: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

12

Various categorisations of documentary information sources

Information sources can be categorised in various ways. For instance:

****

•Primary

•Secondary

•Hard copy /not digital

•Digital

•Offline

•Online

•Text•Image•Sound•Animation/video•Software•Data•Interactive

•Books

•Serials

Page 13: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

13

Past

Now

Future

Retrospective searching versus current awareness: scheme

****

Retrospective searching

Current awareness

Page 14: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

14

Information retrieval: evolution of storage and distribution media

****

• 1450 printing with reusable characters/fonts

• 1975 + online access databasesfrom the 1970s growing Internet

• 1985 + CD-ROM

• 1990 + World-Wide Web

(based on the Internet)

Page 15: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

15

Information retrieval: end user or information intermediaries

End-user

Information intermediary(Broker or library or ...)

Information

****

Page 16: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

16

End user versus information intermediary

• People can retrieve information themselves, directly as so-called “end-users”.

• However,

»the information landscape is complex,

»it may cost a lot of the time to find the right information,

»it may be costly to search for information

• Therefore it may be wise to obtain the assistance of an expert information intermediary, such a a reference librarian or an information broker.

****

Page 17: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

17

About “information”

Evaluating information sources

****

Page 18: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

18

Documentary information sources: evaluate their quality

• We should always be critical when using information sources, in view of

»the widely varying degrees of quality of information sources, and of

»the costs associated with searching, finding, using information.

****

Page 19: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

19

Documentary information sources: criteria to evaluate their quality (1)

• Is the information valid, reliable, trustworthy, genuine, authentic? Is the author honest? Is the source objective, not subjective, without cultural or political or ideological or commercial bias? Is the origin an individual or a company or an organisation?Is the publication sponsored by some company or organisation?

****

Page 20: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

20

Documentary information sources: criteria to evaluate their quality (2)

• Is the information accurate, correct? Who is the author or producer? Has the source an author or a producer with a high expertise, a good reputation, good qualifications?Can the author be contacted for clarification or discussion? Was the information reviewed, edited, improved, corrected, censored, approved, verified, before publication? Do experts agree on the information provided?

****

Page 21: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

21

Documentary information sources: criteria to evaluate their quality (3)

• Is the information source unique? Does it offer a great amount of primary information, which is not obtainable from other sources?

• Is the information complete? Is the work available in its entirety?

• Does the source offer a wide coverage? Is the source comprehensive, substantive?

• Is the information current enough, up to date? Is a publication date provided?Is an expiration date provided?

****

Page 22: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

22

Documentary information sources: criteria to evaluate their quality (4)

• Does the document provide suitable references, so that you can verify statements and find older suitable information sources?

• Good clear format and lay-out of the information / User-friendly information system / Easy for users to orientate themselves within the resource and to find their way around it?

• Good user support / Good customer support?

• Is the type of distribution medium appropriate? (print, e-mail, online,...)

****

Page 23: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

23

Documentary information sources: criteria to evaluate their quality (5)

• Is the information what you want?If not, then reassess your needs and consider other types of information as well.

****

Page 24: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

24

Documentary information sources: criteria to evaluate their quality (6)

• Is the information suitable for your level of understanding of the subject? Is the document popular, suitable for the general public, for students, for professionals, for scholarly/academic use…?Doest it report new, primary research (survey, experiment, observation, measurement, invention) or is it a review of sources published earlier?

• Does the information repeat or confirm what you already know, or is it complementary, contradictory, new?

****

Page 25: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

25

About “information”

Computer- and network-based information

****

Page 26: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

26

Information: from bits to meaningful information

Digitalcomputer data = bits

or01Program code, meaningful for andto be interpreted / executed bya suitable / compatible computer

Information = “documents”, meaningful for andto be interpreted byhuman beings

****

Page 27: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

27

Information: digitally stored and managed information

Categories of digital, computer readable information / data, forming electronic “documents”,understandable by human beings.

01textnumbersimagesvideosounds

multimedia

+

****

Page 28: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

28

01

Digital information

Multimedia / Hypermedia

Information: types of digital information

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

****

Page 29: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

29****

Online / Networked

CD-ROM

Update speed

Volume

Some publication media compared

Printed

Page 30: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

30

Publications on CD-ROM or online: advantages compared with hard copy

***-

• Can be cheaper to produce, to transport and to store.

• Can offer better search features.

• Can offer various output formats.

• Can offer fast and efficient “copy and paste” by the reader/user of information to other documents.

Taken together, these features allow more efficient access to large, high volume documents or databases.

Page 31: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

31

Scientific publishing in Utopia: an ideal scheme

Many authorsMany authors

Many readers / usersMany readers / users

Many editors / publishersMany editors / publishers

Online remote access multimedia database serverOnline remote access multimedia database server

Many database search clients and user interfaces

Many database search clients and user interfacesone global ,

international computer data communication network

author = reader in science

****

Page 32: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

32

!? Question !?

Indicate the differences between reality

and that simplified, ideal schemeof the information flow.

Indicate the differences between reality

and that simplified, ideal schemeof the information flow.

****

Page 33: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

33

!? Question !?

Which basic problems/difficulties hinder people

to find / access / use information?

Which basic problems/difficulties hinder people

to find / access / use information?

****

Page 34: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

34

Information retrieval: basic difficulties (Part 1)

****

• In many cases it is not completely clear to the user of an information retrieval system which information is in fact needed, required.

• In many cases the need for information cannot be expressed completely in the form of a query.

One of the reasons is that the complete context of the information need should ideally be expressed, including the knowledge and background of the searcher.

Page 35: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

35

Information retrieval: basic difficulties (Part 2)

****

• Computer systems are artificial, but nevertheless most use human language in their interface with the human users, for instance in database search systems. This may cause difficulties related to language and vocabulary in particular. Some examples:

• People use different languages and different terms (vocabularies) to describe a similar concept.

• Concepts, vocabularies and meanings of words and terms may change over time.

• Meanings of words / terms may depend on their context.

Page 36: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

36

Information retrieval: basic difficulties (Part 3)

****

• Many different and imperfect retrieval systems should or must be used.

»To retrieve and access the information that is in principle available, many different retrieval systems must be available and be mastered.

»Furthermore, a perfect information retrieval software does not (yet) exist; scientific and technological evolution is fast in the domain of information retrieval software since about 1970.

Page 37: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

37

Information retrieval: basic difficulties (Part 4)

****

• Information overload

Users are often overwhelmed by the amount of available information and by the large influx of new information.

Page 38: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

38

Information retrieval: basic difficulties (Part 5)

****

• The price (or inaccessibility) of particular information

A lot of information cannot be obtained or at least not free of charge.

Page 39: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

39

Information retrieval: browsing and searching as methods

• To make information available, the producer of an information system can offer to the user basically two different ways for retrieval of the right information from the system:

»by browsing or

»by searching.

***-

Page 40: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

40

• Browsing a logically ordered list of terms

• Logical order /Sorted by subject

• Table of contents

• Classification

• Hypertext-Hypermedia:jump from a page to a linked page

• Searching by submitting a search term to the system

• Alphabetical order / Not sorted by subject

• Alphabetical index

• Thesaurus

• Hypertext-Hypermedia: search built in a page

Information retrieval: browsing versus searching

***-

Page 41: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

41

Information retrieval: browsing systems

• In browsing systems, the user can follow some of the paths offered by the system.

• The information is ordered, according to subject for instance.

• The user does not have to use his own words to indicate his needs.

• To support organising and browsing of information items, some type of classification is applied in many cases.

***-

Page 42: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

42

Information retrieval: examples of browsing systems

• Examples of browsing systems are

»a table of contents in the front part of a book,

»a set of books placed on shelves according to some classification system,

»a hypertext hierarchical directory on the WWW, or more generally all hypermedia systems.

***-

Page 43: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

43

Information retrieval: search systems

• In search systems, the user has to express his need for information by formulating a query that is normally using a natural language or a more formal language.

• In this case the information is normally not ordered according to some logic, but in most cases in the form of a well structured compilation of items of a similar form, in the form of the records of a database when a computer system is applied.

***-

Page 44: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

44

Information retrieval: examples of search systems

• Examples of search systems are

»the index (the register) in the back part of a book,

»a library or museum catalogue with a search interface,

»a search form on a web page.

***-

Page 45: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

45

Advantages:

»Browsing is relatively easy for the user.

Difficulties for the user:

»Allows the user to explore the information space by roads constructed based on the view of the world of the system designers, and not based on his own view.

Difficulties for the producer:

»It is relatively costly to construct an information system based on browsing.

Information retrieval: pro and contra of browse systems

***-

Page 46: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

46

Advantages:

»Creation of keyword indexes for fast searching is relatively simple and cheap and can be automated.

Difficulties for the user:

»Searching is hindered by vocabulary / language problems.

»The users cannot always fully articulate their needs.

Information retrieval: pro and contra of search systems

***-

Page 47: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

47

The information industry and the information market

The components of the information industry

****

Page 48: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

48

The components of the information industry

• Authors

• Publishers

• Distributors

• Users

• Related organizations

****

Page 49: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

49

The information industry and the information market

The information industry and the information market

Overview and evolution

****

Page 50: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

50

Increase in the number of scientific and technical serial publications

1

10

100

1000

10000

100000

1000000

1650 1700 1750 1800 1850 1900 1950 2000

****

Page 51: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

51

The information market: growth in the database industry

0

2000

4000

6000

8000

10000

1975 1980 1985 1990 1995

Number oflivingdatabases

Number ofdatabaseproducers

Number ofvendors

****

Source: Williams, in: Gale Directory of Databases, 1998.Source: Williams, in: Gale Directory of Databases, 1998.

Page 52: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

52

The information industry / market: future trends (Part 1)

• Growth in the production of databases.

• Less analogue / hard-copy production = more digital production, storage, and distribution of information.

• More integration of information types into multimedia and hypermedia.

****

Page 53: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

53

The information industry / market: future trends (Part 2)

• Growth in the number of

»producers and distributors,

»end-users searching databases due to easier use and lower costs of information technology

****

Page 54: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

54

Databases and computerized information retrieval

Introduction

****

Page 55: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

55

What is a database?

A database is a collection of similar data records stored in a common file (or collection of files).

****

Page 56: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

56

Types of databases: examples

Examples: The databases that form the basis for

»catalogues of books or other types of documents

»computerized bibliographies

»address directories

»a full text newspaper, newsletter, magazine, journal+ collections of these

»WWW and Internet search engines

» intranet search engines

» ...

****

Page 57: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

57

Information management

Information retrieval

Information retrieval and related activities: figure

Image retrievalText retrieval

Presentation of information

***-

Page 58: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

58

Information retrieval: via a database to the user

***-

Informationcontent

Informationcontent

Linear file Inverted file

Search engine

Search interface UserUser

Database

Page 59: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

59

Comparison

Information retrieval: the basic processes in search systems

Information problem

Representation

Query Indexed documents

Representation

Retrieved, sorted documents

Text documents

Evaluation and

feedback

***-

Page 60: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

60

Information retrieval systems: many components make up a system

• Any retrieval system is built up of many more or less independent components.

• These components can be modified to increase the quality of the results more or less independently.

***-

Page 61: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

61

Information retrieval systems: important components

***-

the information content

system to describe formal aspects of information items

system to describe the subjects of information items

concrete descriptions of information items = application of the used information description systems

information storage and retrieval computer program(s)

computer system used for retrieval

type of medium or information carrier used for distribution

Page 62: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

62

What determines the results of a search in a retrieval system?

• the information retrieval system ( = contents + system)

• the user of the retrieval system and the search strategy applied to the system

***-

Result of a searchResult of a search

Page 63: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

63

Databases and computerized information retrieval

Text retrieval and language

***-

Page 64: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

64

Text retrieval and language: a word is not a concept (a)

Text retrieval and language: a word is not a concept (a)

Problem: A word or phrase or term is not the same as a concept or

subject or topic.

***-

Word

WordConcept

Page 65: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

65

Text retrieval and language: a word is not a concept (a’)

So, to ‘cover’ a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to “cover” the concept

***-

Page 66: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

66

Text retrieval and language: a word is not a concept (a’’)

Text retrieval and language: a word is not a concept (a’’)

»synonyms!

»narrower terms, more specific terms (such as particular brand names);including terms with prefixes(for instance: viruses, retroviruses, rotaviruses,...)

»spelling variations (such as UK English versus US English);possible variations after transliteration

***-

Page 67: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

!? Question !?

Which problems in text retrieval are illustrated by the following sentences?

Which problems in text retrieval are illustrated by the following sentences?

***- 67

Page 68: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

68

Time flies like an arrow.

Fruit flies like a banana.

?

***-Examples

Page 69: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

69

Time flies like an arrow.

Fruit flies like a banana.

***-Examples

Page 70: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

70

Time flies like an arrow.

Fruit flies like a banana.

OK!

***-Examples

Page 71: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

71

Text retrieval and language: ambiguity of meaning (a)

Text retrieval and language: ambiguity of meaning (a)

• Problem: A word or phrase can have more than 1 meaning.Ambiguity of the meaning of a word is a problem for retrieval. This decreases the precision of many searches.The meaning can depend on the context. The meaning may depend on the region where the term is used.

***-

Page 72: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

72

Text retrieval and language: ambiguity of meaning (a’)

Text retrieval and language: ambiguity of meaning (a’)

»Example:

—Pascal the philosopher

—Pascal the computer language

***-

Page 73: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

73

Text retrieval and language: ambiguity of meaning (a’’)

Problem: Ambiguity of meaning

may be the cause of low precision.

***-

WordConcept

Concept

Page 74: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

74

A word is not a conceptA concept is not a word

1 word or term does/can not “cover” a concept = a concept cannot be “covered” by only 1 word or term;

this may be the cause of low recall.

Word

WordConcept

****

Page 75: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

75

A word is not a conceptA concept is not a word

Ambiguity of meaning may be the cause of low precision.

****

WordConcept

Concept

Page 76: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

76

Text retrieval and language: conclusions

• The use of terms and language to retrieve information from databases/collections/corpora causes many problems.

• These problems are not recognized or underestimated by many users of search/retrieval systems= The power of retrieval systems is overestimated by many users.

• Much research and development is still needed to enhance text retrieval.

***-

Page 77: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

77

Databases and computerized information retrieval

Hints on how to use information sources

****

Page 78: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

78

Hints on how to use information sources: overview (Part 1)

• Know the purpose and motivation for each search.

• Do not be lazy: search on your own, before bothering experts with requests for advice.

• Plan your search in advance.

• Choose the best source(s) for each search.

• Use the right tools for each job (a suitable communication program for instance, in the case of online searches).

• Do not focus on a single source.

****

Page 79: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

79

Hints on how to use information sources: overview (Part 2)

• Consider citation indexes besides subject-oriented databases, as useful secondary information sources.

• Use the available tools for subject searching well.

• Try to cope with the language problems.

• Match your search strategy with the type of source.

• In computer-based retrieval systems, combine search terms when appropriate, using

»Boolean operators

»proximity operators (for instance “near”,...)

****

Page 80: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

80

Hints on how to use information sources: overview (Part 3)

• Work cost-effectively.

• Use special care when searching for names.

• Work iteratively.

• Keep a record of your work.

• Be critical: not all information is correct or useful.

• Stop searching when “enough is enough”

• Give up if necessary... (Not all questions have an answer.)

• ...

****

Page 81: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

81

Hints on how to use information sources: subject searching

• When you search for information on a particular topic/subject: investigate if the database producer offers

»a subject classification scheme and/or

»a controlled/approved/accepted subject terms, and/or

»a subject thesaurus

• Exploit these, if they are available.

• In most cases you should find and use synonyms and narrower terms

• Use broader and /or related terms, if appropriate.

****

Page 82: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

82

Hints on how to use information sources: Boolean combinations (1)

Most text search systems understand the basic Boolean operators:

AND = obtain records that contain both search terms

OR = obtain records that contain one or both search terms

NOT= exclude records that contain a search term

****

Page 83: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

83

Hints on how to use information sources: Boolean combinations (2)

Most text search systems understand the basic Boolean operators typed in capital characters:

OR

AND

****

Page 84: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

84

Hints on how to use information sources: Boolean combinations (3)

In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible.

****

term x1OR term x2ORterm x3

term x1OR term x2ORterm x3

term y1OR term y2OR term y3

term y1OR term y2OR term y3

term z1OR term z2OR term z3

term z1OR term z2OR term z3

AND AND AND ...

Page 85: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

85

!? Question !? Task !? Problem !?

How many (and which) concepts do you see in a search for

“general reviews about

monitoring seawater pollution that is due to effluents”?

How many (and which) concepts do you see in a search for

“general reviews about

monitoring seawater pollution that is due to effluents”?

****

Page 86: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

86

!? Exercise !? Task !? Problem !?

Prepare off-line, on paper, a suitable search query in a generic format, to find

“general reviews about

monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases.

(Limit yourself to 1 of the concepts.)

Prepare off-line, on paper, a suitable search query in a generic format, to find

“general reviews about

monitoring seawater pollution that is due to effluents” as the basis for later, concrete searches in databases.

(Limit yourself to 1 of the concepts.)

****

Page 87: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

87

Hints on how to use information sources: example of a search query

Example: Searching for the concept “sea” can or should involve the for instance the following words in a Boolean OR combination:baltic OR bay OR bays OR coast OR coastal OR coastline OR coasts OR cove OR coves OR gulf OR mangrove OR mangroves OR marine OR mediterranean OR noordzee OR noordzeekust OR noordzeekusten OR ocean OR oceanic OR oceans OR reef OR reefs OR “saline-freshwater interface” OR sea OR seas OR seashore OR seawater OR seawaters OR shore OR shores

***-Example

Page 88: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

88

!? Question !? Task !? Problem !?

What did you learn from the exercise

on the formulation of a query?

What did you learn from the exercise

on the formulation of a query?

****

Page 89: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

89

Hints on how to use information sources: work iteratively

Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search.

****

Results

Query Searching

Feedback

Page 90: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

90

“The ability to ask the right question is more than half the battle of finding the answer.”

Thomas J. Watson

****

?

Page 91: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

91

Hints on how to use information sources: when to stop searching?

Develop a feel for the “curve of diminishing returns”:

If you spend too much time, effort, and/or money with too few benefits, you should stop.

****

time / effort / money

payoffTime to stop?

Page 92: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

92

Knowledge organisation: classifications, and thesaurus systems

Knowledge organisation: classifications, and thesaurus systems

Introduction

****

Page 93: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

93

• To organise knowledge / documents / books / reports / information / data / records / things / items / materials for more efficient storage and retrieval, some related, similar tools / systems / methods /approaches are used.

• Often but not yet always, this process is assisted by a computer system.

• Good systems are expanded and updated when the need arises.

• The organization system applied should ideally be clearly and immediately visible or even searchable on computer, by the user of the materials.

Knowledge organisation: introduction

Knowledge organisation: introduction

****

Page 94: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

94

• Various tools / systems / methods / approaches are available:

»Classification

»Taxonomy

»Thesaurus

»Ontology

»…

Knowledge organisation: some tools

Knowledge organisation: some tools

***-

Page 95: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

95

Knowledge organisation: classifications, and thesaurus systems

Knowledge organisation: classifications, and thesaurus systems

Classifications

****

Page 96: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

96

Classification systems: introduction

• Classification systems present the subjects in a logical order, usually going from the more general to the more specific.

***-Examples

Page 97: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

97

• Universal means here: covering all subjects

• Not just one but several competing systems exist. Examples

»Universal Decimal Classification = UDC

used mainly outside U.S.A.

»Dewey Decimal Classification = DDC

used mainly in U.S.A.

»Library of Congress Classification

used mainly in U.S.A.

» ...

Classification systems: examples of universal systems

Classification systems: examples of universal systems

****Examples

Page 98: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

98

Knowledge organisation: classifications, and thesaurus systems

Knowledge organisation: classifications, and thesaurus systems

Thesaurus systems

****

Page 99: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

99

Thesaurus: descriptionThesaurus: description

• Thesaurus (contents) =

»system to control a vocabulary (= words and phrases + their relations)

»the contents of this vocabulary

• Thesaurus program =

program to create, manage, modify and/or search a thesaurus using a computer

****

Page 100: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

100

Thesaurus relations

Thesaurus relations

Term(s) with broader meaning

BT (= Broader Term)

RT (= Related Term) UF (= Use(d) For)Other term(s) Term Synonym(s)

NT (= Narrower Term)

Term(s) with narrower meaning

****

Page 101: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

101

Thesaurus systems that cover all subjects

Thesaurus systems that cover all subjects

• General systems

• Universal systems

• Covering all subjects

• Broad and shallow systems

• Horizontal systems

***-

Page 102: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

102

Thesaurus systems that cover all subjects: examples

Thesaurus systems that cover all subjects: examples

• thesaurus system built into word processing software

• Library of Congress Subject Headings (LCSH)

• thesaurus system that runs on a pc; see for instance http://www.wordweb.co.uk/free/

• thesaurus systems that can be used free of charge through the WWW

»http://education.yahoo.com/reference/thesaurus/index.html

»http://thesaurus.plumbdesign.com/

***-Examples

Page 103: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

103

!? Exercise !? Task !? Problem !?

Practice using a general thesaurus system that is built in your program for word processing.

Practice using a general thesaurus system that is built in your program for word processing.

***-

Page 104: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

!? Exercise !? Task !? Problem !?

Have a look at various global, general, universal thesaurus systems.

Consider which ones may be useful for your future online information searches.

Have a look at various global, general, universal thesaurus systems.

Consider which ones may be useful for your future online information searches.

**-- 104

Page 105: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

105

Computer networks, data communication and Internet

Introduction

****

Page 106: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

106

Data communication: a definition

• Interpersonal communication

» Telecommunication

—Broadcast

—Telephone

—Data communication

–Remote login

–File transfer

–Hypertext transfer

–Electronic mail

–...

****

Page 107: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

107

01

Digital information

Multimedia / Hypermedia

Data communication: which types of ‘data’?

Linear textHypertext

Static imagesVideo

Sound

Programs for computers

****

Page 108: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

108

Data communication: which types of ‘data’?

• The same types of data (information) that can be stored and managed on a computer can be transferred over computer networks to one or several other computers.

• So the networks form an important extension of the stand-alone computers.

• “The network is the computer”

****

Page 109: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

109

Data communication: applications

• Hard-copy transfer (Fax)

• Online use of the processing power of a remote computer

• Online access to information sources !

»library catalogues,

»bookshop catalogues,

»publisher’s catalogues,

»campus-wide and community information systems,

»(text or multimedia) databases,

»network-based journals, ...

****

Page 110: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

110

Data communication: problems, difficulties, limitations

• Low transfer speed

• Technical complexity

***-

Page 111: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

111

Computer network protocols: definition

• When 2 computer systems communicate via network, they do that by exchanging messages.

• The structure of network messages varies from network to network.

• Thus the message structure in a particular network is agreed upon a priori and is described in a set of rules, each defined in a protocol.

****

Page 112: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

112

Computer networks, data communication and Internet

National Wide Area Networks

****

Page 113: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

113

National Wide Area Networks

• Public access national packet switching networks

• Research computer networks

• Public access made available by Internet Service Providers

• ...

****

Page 114: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

114

Computer networks, data communication and Internet

International computer networks

****

Page 115: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

115

International computer networks: examples

• National public data communication networks linked together

• Internet

• FidoNet

• Bitnet / EARN

• Usenet

• ...

****Examples

Page 116: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

116

Computer networks, data communication and Internet

The Internet data communication network

****

Page 117: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

117

@

The Internet data communications network (Part 1)

• “Internet” is not well-defined.

• A network of smaller networks:The global collection of interconnected local area, regional and wide-area (national backbone) networks which use the TCP/IP suite of data communication protocols.

****

Page 118: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

118****

The Internet data communications network (Part 2)

• Links computers of various types.

• Is constantly growing.

• The analogy of a superhighway has been used to describe the emerging system of networked computers.

• The Internet has no owner, and is not managed by one organization. @

Page 119: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

119

The Internet: access from your Local Area Network

Your microcomputer

Local Area Network (LAN)

One of the national networks

The global Internet

****

Page 120: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

120

Host computers in the Internet: definition

• A host (computer) is a domain name that has a unique IP address record associated with it.

• Could be any computer connected to the Internet by any means.

• For instance: www.vub.ac.be

****

@

Page 121: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

121

Transmission Control Protocol / Internet Protocol (TCP/IP)

• the main suite of transport protocols used on the Internet for connectivity and transmission of data across heterogeneous systems

• “glue that holds the Internet together”

• an open standard

• available on most Unix systems, VMS and other minicomputer systems, many mainframe and supercomputing systems and some microcomputer and PC systems

****

Page 122: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

122

Internet: growth in number of hosts worldwide: linear plot

0

5000000

10000000

15000000

20000000

1993 1994 1995 1996 1997 1998

****

January of each year

Page 123: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

123

Internet Service Provider= ISP

****

Internet Service Providers provide their clients access to Internet + in many cases

»an email address / server

»space for a web site

»software tools to start

» training

» technical support

»an accessible location for a WWW site of the client

»assistance with WWW site design and promotion

Page 124: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

124

World-Wide Web = WWW

Introduction

****

Page 125: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

125

The WWW: example of a welcome page

****Example

Page 126: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

126

URL = Universal Resource Locator

• = draft standard for specifying an object on the Internet

• the structure is in most casesprotocol://computer_address[/path_name/file_name]

• examples:

» telnet://biblio.vub.ac.be

»ftp://ftp.vub.ac.be/

»gopher://gopher.vub.ac.be/

»http://www.vub.ac.be/BIBLIO/index.html

»news://news.server.edu/comp.infosystems.www

****

Page 127: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

127

URLformat / structure

1. The first part of a URL, before the colon “:”, specifies the access method = protocol

2. The second part of the URL, after the colon “:”, is interpreted specific to the access method. In general, two slashes after the colon indicate a machine /computer name.

****

Page 128: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

128

!? Question !? Task !? Problem !?

What is the difference between Internet and the World-Wide Web?

What is the difference between Internet and the World-Wide Web?

****

Page 129: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

129

The WWW is an application of Internet

****

• The World-Wide Web (WWW) is a service, an application of Internet.

• It is based on the Internet infrastructure.

• So the WWW is newer than the Internet. The concept of the WWW was created at the end of the 1980s when the Internet was already well established.

Page 130: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

130

The WWW is an application of Internet: scheme

****

Data communication

Internet

WWW

Page 131: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

131

The WWW: the essential elements

• Information delivery and access using hypertext/hypermedia documents/objects

»html documents

»http protocol: http clients http servers

• Integration of protocols in the Internet:

»http servers offering html documents including links to other http servers, telnet servers, ftp servers, nntp servers, gopher servers, ,...

****

Page 132: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

132

World-Wide Web = WWW

WWW client programs

****

Page 133: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

133

WWW: client / browse programs

• To access the WWW, you run a browser program.

• The browser reads documents, and can fetch documents from other sources. Information providers set up hypermedia servers which browsers can get documents from.

• The browser can display hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a transparent way: select the pointer, and you are presented with the text that is pointed to.

****

Page 134: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

134

WWW: examples of browsers for your own computer

Browsers are available for many computer platforms; in particular: browsers for Windows + Winsock:

»Netscape

»Microsoft Internet Explorer

»...

****

Page 135: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

135

!? Question !? Task !? Problem !?

Browse the WWW, using an available

browser client program.

Browse the WWW, using an available

browser client program.

****

Page 136: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

136

!? Question !? Task !? Problem !?

What came first: Internet or WWW?Explain.

What came first: Internet or WWW?Explain.

***-

Page 137: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

137

World-Wide Web = WWW

Saving information from a web

****

Page 138: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

138

WWW: How to save information from a web?

Information displayed by your web browser/client program can be saved,

• by select, copy, paste in another document (and save)

• by saving a complete page to your disk

» in separate files (for instance 1 HTML file + some image files)

» in 1 file, using Microsoft Internet Explorer 5 or a later version

• by copying the information into an e-mail message that you send to your own e-mail account

****

Page 139: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

139

!? Exercise !? Task !? Problem !?

Copy some text fragment from WWWand paste it into another document

on your computer.

Copy some text fragment from WWWand paste it into another document

on your computer.

****

Page 140: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

140

!? Exercise !? Task !? Problem !?

Save a text from WWW to disk, as HTML,

using a browser program.

Save a text from WWW to disk, as HTML,

using a browser program.

****

Page 141: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

141

!? Exercise !? Task !? Problem !?

Display an HTML file that you have saved

from the WWW to your disk,in a program for word processing.

Is the file displayed properly?

Display an HTML file that you have saved

from the WWW to your disk,in a program for word processing.

Is the file displayed properly?

****

Page 142: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

142

World-Wide Web = WWW

The success of WWW

****

Page 143: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

143

WWW: growing number of WWW servers

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

1993 1994 1995 1996 1997 1998 1999 2000

****

Page 144: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

144

WWW as popular method to access information from computers

****

• The WWW has quickly become the most popular medium to access information that resides on various computers that are connected to a computer network.

Page 145: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

145

Online access information sources and services

Introduction

****

Page 146: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

146

Internet based information sources: problems / difficulties (Part 1)

• Redundancy and overlap:On the one hand, there is too much information on some topics; in other words, the redundancy and overlap are high in many cases. Too few information sources: On the other hand, there are too few information sources on some topics.

****

Page 147: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

147

Internet based information sources: problems / difficulties (Part 2)

• No order is imposed on most sources.Quality checks / quality controls are not performed.Related to this: it is not required to register new information offered. Is the information that you find real, honest, authentic?

****

Page 148: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

148

Internet based information sources: how many? how much information?

In 2001:

• More than 10 terabyte (= 10 000 gigabyte) of text data

In 2002:

• More than 2000 million (= 2 billion) unique URLs in the total Internet

****

Page 149: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

149

Online access information sources and services

Types of online access information systems

***-

Page 150: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

150

Types of online access information systems: “free” versus “fee”

• A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor.

• Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge.

****

Page 151: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

151

Types of online access information systems: “free” versus “fee”

****

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Page 152: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

152

Types of online access information systems: “free” for members only

****

Public access information sources free of charge

Fee-based online information services(NOT free of charge)

Fee-based online information services, made accessible “free of charge”

by an institute to its members

Page 153: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

153

Online access information sources and services

Dictionaries and encyclopaedias accessible through the WWW

****

Page 154: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

154

Dictionaries and encyclopedias through the WWW: introduction

• Dictionaries and encyclopedias are the first choice among many types of information sources,

»when we do not need detailed information on a common topic

»when we want to prepare a more detailed search on an unfamiliar topic, by searching for the right spelling, synonyms, context,…

• Some dictionaries and encyclopedias are available through the WWW free of charge.

****

Page 155: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

155

Dictionaries accessible through Internet and the WWW: example

• The American Heritage® Dictionary of the English Language

»Over 200,000 entries, 70,000 audio word pronunciations, 900 full-page color illustrations

»Available free of charge from http://education.yahoo.com/reference/dictionary/

****Example

Page 156: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

156

Dictionaries accessible through Internet and the WWW: compilation

• A compilation/collection of dictionaries can be searched simultaneously and free of charge: http://www.onelook.com/

****Example

Page 157: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

157

Encyclopedias accessible through Internet and the WWW: examples

• Encarta Concise Free Encyclopedia 

»http://encarta.msn.com/

»Available in English and in some other languages

****Example

Page 158: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

158

Encyclopedias accessible through Internet and the WWW: examples

• Encyclopædia Britannica only a small part is available free of charge + links to selected WWW sites

»http://www.britannica.com/

• Encyclopædia Britannica Concise

»http://education.yahoo.com/reference/encyclopedia/

****Example

Page 159: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

159

Encyclopedias accessible through Internet and the WWW: examples

• The Canadian Encyclopedia(in English and in French):

»http://thecanadianencyclopedia.com/

****Example

Page 160: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

160

Encyclopedias accessible through Internet and the WWW: examples

• Several encyclopedias and dictionaries have been integrated and are searchable simultaneously and free of charge through http://xrefer.com/

****Example

Page 161: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

161

Encyclopedias accessible through Internet and the WWW: overviews

• A list / overview of encyclopedia on the Internet:http://www.internetoracle.com/encyclop.htm

• Other lists of encyclopedia on Internet can be found as a part of more general directories of Internet-based information sources.

****Example

Page 162: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

162

Online access information sources and services

Internet directories and indexes

****

Page 163: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

163

Internet: meta-information about Internet information sources

• in printed manuals and guides:

- it is not always possible to get a copy fast

- it costs money to get a copy

- they are soon out of date

• offered on the WWW!:

+ directly available when we want to use the Internet

+ many systems are accessible free of charge

+ most systems are regularly updated

• (“intelligent agent” software on client PC)

****

Page 164: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

164

Internet: subject-oriented meta-information offered via WWW

Information about information sources: in the form of

»subject guides = texts with references

»subject hypertext directories = subject guides

»key word indexes, generated automatically, for searching

»collections of links or forms to the above

»(multi-threaded search systems)

****

Page 165: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

165

Internet global subject directories:introduction

• They are virtual libraries with open shelves, for browsing.

• They are manually generated, man-made by many people.

• They can be browsed following a tree structure or a more complicated variation.

• The most famous of these systems belong to the most popular and most visited sites on the WWW: e.g. Yahoo!

****

Page 166: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

166

Internet global subject directories: structure

The structure corresponds to a classification that is in most cases specific for the particular overview. In other words: the well-known and classical universal classification systems are not used in most Internet directories.

****

Page 167: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

167

Internet global subject directories: limitations

• They cover only a small number of selected WWW sites, in comparison with the total number of sites that are accessible.

• They are suitable mainly for broad searches that can be difficult to formulate in words, but NOT for more specific searches that require combinations of several concepts.

****

Page 168: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

168****

Internet global subject directories:searching directories with a query

• Many of the Internet directories include an index to search their contents with a query.

• However, then the assisting classification structure is not well exploited and the user should be aware of the problems and difficulties of information retrieval with natural language queries.

• Furthermore, the possibility to use the system in this way may be confusing, as these directories are not real full-text Internet indexes, like those provided by other search tools.

Page 169: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

169

Internet global subject directories: Yahoo!

• A hypertext global subject directory can be found at http://www.yahoo.com/

and at many other sites, includinghttp://www.yahoo.co.uk/

• Entries are NOT rated.

• Accessible free of charge.

****Example

Page 170: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

170

Internet global subject directories: Yahoo! links in pediatrics

• Health > Medicine > Pediatrics:• International Pediatric Chat - for professionals to share information and education

regarding children's health care.

• National Med/Peds Residents' Association - organization for residents, practioners and medical students interested in combined internal medicine and pediatrics.

• Neonatology Network - information and communication platform for neonatologists and pediatricians.

• Pediatria OnLine - qui si parla di bambini, fra pediatri e con le famiglie.

• Pediatric Critical Care

• Pediatric Database (PEDBASE) - containing descriptions of over 500 childhood illnesses.

• Pediatric Endocrinology Conference - LWPES/ESPE joint meeting occuring July 6-10 2001.

• Pediatric Endoscopic Photos - illustrating intestinal problems in children.

***-Example

Page 171: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

171

Internet global subject directories: Yahoo! for pediatrics

• Health > Medicine > Pediatrics:link to a digital library (health sciences) for young patients

***-Example

Page 172: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

172

Internet global subject directories: Yahoo! to pediatrics organisations

• Health > Medicine > Pediatrics > Organizations:link to the American Academy of Pediatrics

***-Example

Page 173: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

173

Internet global subject directories: Yahoo! links to pediatrics schools

• Health > Medicine > Pediatrics >Schools, Departments, and Programs

• University of Rochester - partnership between pediatric residents and community-based agencies that serve children and their families.

• Michigan State University@

• Royal College of Paediatrics and Child Health - responsible for training, examinations, professional standards, and organisation of child health services for the UK.

• Tohoku University

• University of Alabama at Biringham - programs and training opportunities in pediatrics. Also contains faculy information and sub-speciatlty descriptions.

• …

***-Example

Page 174: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

174

Internet global subject directories: searching with a query in Yahoo! (1)

• The directory of Yahoo! can not only be browsed, but can also be searched with a query.

• However, in this way the hierarchical structure is not well exploited.

• For the formulation of a search query, Yahoo! can provide automatic assistance related to spelling and word variations. For instance: After searching for “Capetown”, Yahoo! Answers: Other Spellings: Try searching for cape town instead.

***-Example

Page 175: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

175

Internet global subject directories: searching with a query in Yahoo! (2)

• When such a query does not provide results, then Yahoo! uses a much larger external Internet index (not produced by Yahoo!) to execute a query based on textual search statements. The chosen Internet index has varied over time.

• This mechanism is not made very clear and may confuse the user.

***-Example

Page 176: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

176

Internet global subject directories: BUBL link

• A hypertext global subject directory to more than 10 000 WWW sites for the higher education community can be found athttp://bubl.ac.uk/link/

• Accessible free of charge.

***-Example

Page 177: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

177

Internet global subject directories: Google directory

• A hypertext global subject directory can be found athttp://directory.google.com/

• Accessible free of charge.

• Very similar to the Open Directory Project.

***-Example

Page 178: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

178

Internet global subject directories: Open Directory Project

• A hypertext global subject directory can be found athttp://www.dmoz.org/

• The contents is also used by in the Google Directory system.

• Accessible free of charge.

***-Example

Page 179: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

179

Internet global subject directories: Resource Discovery Network

• A collection of hypertext subject directories that focus on academic information sources can be found athttp://www.rdn.ac.uk/

• Together these lead to more than 30 000 selected WWW sites.

• Accessible free of charge.

***-Example

Page 180: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

180

!? Exercise !? Task !? Problem !?

Try to find Internet sourceswhich are relevant for you, by using an Internet-based

global subject directory.

Try to find Internet sourceswhich are relevant for you, by using an Internet-based

global subject directory.

****

Page 181: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

181

Internet global subject directories: evaluation criteria (Part 1)

• Is usage free of charge?

• Wide coverage?

• Up to date? Frequent updates? Only few dead / broken links?

• Good coverage of the sources in that part of the world in which you are interested?

• Does the manager of the directory refuse to give priority to sites that want to pay to get a prominent place in the directory?

***-

Page 182: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

182

Internet global subject directories: evaluation criteria (Part 2)

• Easy user interface?

• Short response times?

• Are mirror sites available closer to you for faster response?

• Good presentation, description of each site?

• Is a rating, appreciation, review offered for each listed site?

• Is translation of documents offered free of charge?

***-

Page 183: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

183

Internet global subject directories: evaluation criteria (Part 3)

• Good documentation and online help?

• Good help desk available?

• High stability and reliability?

***-

Page 184: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

184

Internet global subject directories: evaluation criteria (Part 4)

• Are other services offered from the same site or with the same interface? Is the subject directory integrated with other services?Additional services can be

»an Internet index or a WWW index or a gateway to such an index for searching with a query

»travel guides, flight and hotel reservations, maps,...

»WWW-based e-mail and e-mail address directories

»auctions through WWW

***-

Page 185: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

185***-

Internet subject directories: non-global, more specific systems

a directory limited to sources in/of a country or region

a directory restricted to a specific subject domain

(“portal”)

a global subject

directory

the complete WWW

can lead to

Page 186: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

186

Internet subject directories focusing on a specific subject domain

• Computer science & engineering: http://www.ub.lu.se/eel/

• Marine science and oceanography: http://oceanportal.org/

***-Examples

        

Page 187: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

187

Internet indexes:automated search tools

• Several systems allow to search for and to locate many items (addressable resources) in the Internet in a more systematic, direct way than by only browsing/navigating.

• These systems do NOT search the contents of computers through the real Internet in real time and completely when a user makes a query. Searching in that way would be much too slow due to limitations in the technology.

****

Page 188: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

188

Internet indexes: scheme of the mechanism

****

User searching for Internet based information

Internet client hardware and software

user interface to a search engine Internet information source

Internet index search engine Internet crawler and indexing system

database of Internet files, including an index

Page 189: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

189

Internet indexes:description of the mechanism

Each of these search systems is based on:

• a database of links to pages / URLs that can be retrieved by searching with queries through a big index that is built machine-made on the basis of the contents, the texts, of these pages(to build this database and to keep it up to date, pages are continuously collected from the Internet by a “robot” computer software system)

• a search system with a user interface in a WWW form, to allow the user to search through that database

****

Page 190: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

190

Internet indexes:AltaVista

***-Example

The primary search interface can be found in the US:

http://www.altavista.com/

http://www.av.com/

(These addresses all lead to the same information.)

Mirror site in UK:

http://www.altavista.co.uk/

Page 191: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

191

Internet indexes:AltaVista: features

• Allows full text searching of the WWW

• Allows advanced Boolean searching (in “Advanced” mode)

• Offers relevance ranking of search results

• Offers a link to an Internet subject directory (Looksmart)

• Offers links to systems to find images, sounds,… (multimedia) in the Internet

***-Example

Page 192: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

192

Internet indexes:Fast = All the Web

***-Example

• The search interface can be found at:http://www.alltheweb.com/

• You can search the WWW and ftp servers.

• The database is one of the biggest.

• Not only HTML and plain text files, but also the full text of many Adobe PDF files is indexed.

Page 193: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

193

Internet indexes: Google (Part 1)

• http://www.google.com/

• One of the most popular systems in 2001, 2002.

• For retrieval an algorithm is used that takes into account the links between WWW pages.A retrieved page is ranked higher when

»many sites/pages point to it

»“important” sites/pages point to it

****Example

Page 194: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

194

Internet indexes: Google (Part 2)

• Full text searching is possible of many files that are available through the WWW.

• Not only HTML and plain text pages are covered, but also the first part is indexed of many files in other formats such as Adobe PDF, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Rich Text Format,…

****Example

Page 195: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

195

!? Question !? Task !? Problem !?

In spite of the popularity of the Google Internet index, there are limitations in the search features.

Which limitations?

In spite of the popularity of the Google Internet index, there are limitations in the search features.

Which limitations?

***-

Page 196: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

196

Internet indexes: Google limitations

• Google does NOT offer/allow

»manual or automatic stemming, manual or automatic truncation

»automatic classification of WWW pages

***-Example

Page 197: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

197

Internet indexes: Google additional features

• Besides a system to search for WWW pages, Google offers also »a subject directory»searching for images on the WWW

»searching an archive of Usenet messages + posting to Usenet groups

• Thus Google has become a great integrator / aggregator.

****Example

Page 198: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

198

!? Exercise !? Task !? Problem !?

Read the manual and

make a search with Google.

Read the manual and

make a search with Google.

***-

Page 199: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

199

Internet indexes: MSN Web Search

• Offered free of charge by Microsoft.

• You can search for WWW content.

• Since 1998.

• Famous system, because the search interface can be found with the search functions that have been built into one of the most widespread Internet browser, Microsoft Internet Explorer, and because it is offered by http://search.msn.com/

• Is based on an Internet index created by another company.

***-Example

Page 200: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

200

Internet indexes: Scirus

• Allows you to search for manually selected scientific information (only) on the WWW, including access controlled sites, such as the peer-reviewed articles in the journals that are published in ScienceDirect by Elsevier.

• Offered free of charge by Elsevier.

• Is partly based on the Fast WWW search system that is also used by Alltheweb.

• The search interface: http://www.scirus.com

***-Example

Page 201: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

201

Internet indexes: Scirus features

• Offers access to information ordered according to some classification system / taxonomy.

• Offers not only access to files in html format, but also to files in PDF, PostScript and other formats.

***-Example

Page 202: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

202

Internet indexes: coverage / size of each index

The indexes grow and their “size ranking” is variable.

Biggest systems in 2002:

• Google !

• AltaVista

• (Fast =) All the Web (serving also Lycos)

• Systems based on the INKTOMI database of WWW pages, such as Hotbot, MSN Web search,…

****

Page 203: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

203

!? Exercise !? Task !? Problem !?

Try to find Internet sourceswhich are relevant for you, by using an Internet index.

Try to find Internet sourceswhich are relevant for you, by using an Internet index.

****

Page 204: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

204

Internet indexes: variations among various systems

• Besides their common aims and characteristics, we can nevertheless see differences, variations among the searchable Internet index systems.

• To illustrate these variations and to assist Internet users to make a decision on which search system to use, the following list of some features and evaluation criteria can be useful.

***-

Page 205: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

205

Internet indexes: evaluation criteria (Part 1)

• Is usage free of charge?

• How complete is the coverage?

• Is the coverage good (or poor) for a particular geographic region?

• Is the coverage good (or poor) for a particular type of documents?

• Is the searchable database up to date? Is the database updated frequently? Do the search results contain only few dead (broken) links?

***-

Page 206: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

206

Internet indexes: evaluation criteria (Part 2)

• Is spamming filtered out, to give other pages a better chance of turning up in the result set?Can the system cluster presumed duplicate documents in the results? Or does the system simply eliminate presumed duplicate documents from its database?

• Does the database system work with a full text indexing of each ASCII and HTML document that has a place in the database, so that full text searching is possible?

***-

Page 207: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

207

Internet indexes: evaluation criteria (Part 3)

• Are the contents of meta-fields also indexed to make them searchable?

• Does the system index also the text in files on the web that consist of non-ASCII codes to make these also searchable and retrievable? For instance files in the format of the various versions of

»Microsoft Word

»Microsoft PowerPoint

»Adobe Acrobat (Portable Document Format)

***-

Page 208: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

208

Internet indexes: evaluation criteria (Part 4)

• Field indexing, so that searching for the contents of a particular field is possible? for instance:

the HTML title, HTML keywords,

URL, date,

link, Java applet,

text, image file,

sound file, video file,...

***-

Page 209: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

209

Internet indexes: evaluation criteria (Part 5)

• Does the system offer powerful search options like

»truncation?

»word stemming?

»Boolean search combinations?

»proximity searching?

»automatic translation of your search terms in several other languages?

»spelling check of your search terms?

***-

Page 210: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

210

Internet indexes: evaluation criteria (Part 6)

• Can the results be limited to a certain time period? For instance based on the date

»of the file as noted by the server computer, or

»of the most recent indexing of the file

• Is the user interface easy to understand and efficient to use?

• Is a user interface offered in your own language?

• Does the system rank the items in the result set according to their presumed relevance?

***-

Page 211: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

211

Internet indexes: evaluation criteria (Part 7)

• Possibility to combine Boolean retrieval with relevance ranking of results?

• Can the results be ordered according to date

»of the file as noted by the server computer, or

»of the most recent indexing of the file

• Can the results be ordered according to size?

• Can all the results (documents) from the same site be grouped together (clustered)?

***-

Page 212: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

212

Internet indexes: evaluation criteria (Part 8)

• Can the system rank the results (documents) on the basis of the number of WWW hyperlinks to that document?

• The system does not place/rank some results (documents) higher in the results list, on the basis of payments by the producer of those documents to the search system company.

• Are advertisements / sponsored links / sponsored results clearly distinguished from normal (not sponsored) search results?

***-

Page 213: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

213

Internet indexes: evaluation criteria (Part 9)

• Short response times?

• Are mirror sites available closer to you for faster response?

• Does the system offer a good presentation format of each result (document/page/item)?For instance: are search terms indicated / highlighted in the results?

• Good and detailed summary of each result available?

• Offers an analysis of words occurring in the results, which can help you to refine a search?

***-

Page 214: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

214

Internet indexes: evaluation criteria (Part 10)

• Is translation of documents offered free of charge?

• High stability and reliability? No large variations/fluctuations in the results from identical searches at different times.

• Good documentation and online help?

• Good help desk available?

• Can the search system provide updated results through electronic mail, as a current awareness tool?

***-

Page 215: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

215

Internet indexes: evaluation criteria (Part 11)

• Other services available besides the normal WWW index:

» index to news resources, that is more frequently updated?

»anonymous ftp file index?

»gopher index?

»searchable Usenet newsgroups archive?

»Internet subject directory?

»White pages = people finder = addresses = ...

»WWW-based e-mail and e-mail address directories

»auctions through WWW

***-

Page 216: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

216

Internet indexes: evaluation criteria (Part 12)

• Is the search/query also submitted to another database to obtain more results? for instance: to a book database to obtain book descriptions besides WWW documents

***-

Page 217: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

217

Internet indexes: evaluation criteria (Part 13)

• Are results (retrieved documents) grouped / classified / clustered by the search system, on the basis of the subjects of the documents and are these presented as groups / clusters / classes to the user of the search system, to assist the user in coping with the problems that can be caused for instance by multiple meanings of words used in a search query.

***-

Page 218: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

218

!? Question !? Task !? Problem !?

Why do different Internet search engines (in most cases)

give different results for an identical search?

Why do different Internet search engines (in most cases)

give different results for an identical search?

***-

Page 219: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

219

Internet information sources

Coverage of Internet directories and Internet indexes

****

A global Internet index

A global Internet directory

Page 220: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

220****

Global Internet search tools: a comparison

Global Internet directories

• Only a limited selection of Internet sources

• Browsing information sources is easy

• Good for broad searches

Global Internet indexes

• About 1/3 of the Internet is covered by an index

• Searching requires some skills and knowledge

• Good for specific, narrow searches

Multi-threaded search systems

• These get information from directories and indexes

• Searching requires some skills and knowledge

• Good when even 1 index does not yield information

Page 221: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

221

!? Question !? Task !? Problem !?

Which information on the Internet is not covered

by many searchable Internet indexes?

Which information on the Internet is not covered

by many searchable Internet indexes?

***-

Page 222: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

222

Internet indexes cover only a part of the Internet: introduction (1)

***-

The “visible” part of Internet

The “hidden, invisible” part of Internet and the WWW, (that is not searchable using a global index

like, AltaVista, Google...)

Page 223: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

223

Internet indexes cover only a part of the Internet: introduction (2)

***-

Why can Internet indexes find only a part of what is in fact available through the Internet?

1. Quantitative technical limitations: Each Internet search system has indexed only a part of the static WWW pages that are available for indexing.

2. Qualitative technical limitations: Besides the static WWW pages that Internet search engines try to cover, many other, quite different sources exist, that are also available through the Internet, but that are not incorporated in those search engines.

Page 224: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

224

Internet

Internet indexes cover only a part of the Internet: scheme

***-

WWW

Databases and

file archives accessible through

the Internet

telnetftp...

telnetftp...

CGI, ASP,...CGI, ASP,...

Rapidly changing information, such as news

Information accessible only when passwords are used

Static indexable texts in the WWW( = on HTTP server computers)

covered partly by Internet indexes

Wordfiles

PDFfiles

Page 225: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

225

Internet indexes cover only a part of the Internet: conclusion for users

When you want to retrieve information about a particular subject from the Internet, use not only WWW indexes, but use also other sources accessible through the Internet

»databases! (book and journal bibliographies, library catalogues, archives of group messages, directories, atlases,…)

»rapidly changing information, such as news

» information accessible only when passwords are used

»anonymous ftp file archives

»e-mail based interest groups; Usenet newsgroups

***-

Page 226: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

226****

Finding multimedia files on the Internet

Several public access search systems are available free of charge to search the Internet for multimedia files:

»images / pictures (either artwork, either photos, or both)

»sound / audio files (music, speeches,...)

»video

Page 227: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

227****

Finding images on the Internet:introduction

• Several public access search systems are available free of charge to search for images / pictures (either artwork, either photos, or both) on the Internet.

• When searching for images, the search results from such a system offer not only links to the image files on the Internet, but also directly small versions of the images (so-called “thumbnails”).

Page 228: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

228****Examples

Finding images on the Internet:examples of search engines

• http://alltheweb.com !!!

• http://gallery.yahoo.com/ !

• http://images.google.com/ !!!! or through http://www.google.com/

• http://multimedia.lycos.com/

• http://www.altavista.com/ !!(also audio and video, choose not the normal text search, but IMAGES in the user interface.)

• http://www.ditto.com/ !

Page 229: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

229**** Examples

Finding images on the Internet:screen shot of a Google image search

Page 230: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

230

!? Exercise !? Task !? Problem !?

Use a specialised search engineto find images

about a particular subject on the Internet.

Use a specialised search engineto find images

about a particular subject on the Internet.

***-

Page 231: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

231

Online access information sources and services

Online access information sources and services

Public access book databases

****

Page 232: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

232

Public access book databases: introduction

Public access book databases: introduction

• Even in this age of Internet-based information sources, a lot of information is still distributed in the form of printed books.

• The contents of most books is (still) not available on the Internet.

• Most Internet search tools do NOT allow you to find out about the existence of books that may be interesting for you.

• So, specific search tools to find books can be useful.

****

Page 233: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

233

Public access book databases: an overview

Public access book databases: an overview

• (Databases by publishers.)

• Databases by book distributors / bookshops!

• Online public access library catalogues

• (Databases of computer-based versions of books.)

****

Page 234: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

234

Public access book databases provided by bookshops

Public access book databases provided by bookshops

• To find currently available books, the bibliographic databases assembled by big bookshops are interesting.

• Several offer a good coverage and are accessible free of charge.

****

Page 235: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

235

Book databases accessible free of charge: examples in U.S.A.

Book databases accessible free of charge: examples in U.S.A.

• Amazon.com (US):http://www.amazon.com/ http://www.amazon.co.uk/ note: amazon, NOT amazoneSubject description is poor.

• Barnes and Noble (US):http://www.bn.com/

****Examples

Page 236: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

236

Book databases accessible free of charge: examples in Europe

Book databases accessible free of charge: examples in Europe

• Blackwell’s on the Internet (International, academic books):http://www.blackwell.co.uk/

• VLB for books in Germanhttp://www.buchhandel.de/

• For books in Frenchhttp://www.chapitre.com

• Boeknet - De Nederlandse Internet Boekhandel (Dutch)http://www.boeknet.nl/

***-Examples

Page 237: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

237

Book databases accessible free of charge: for old books

Book databases accessible free of charge: for old books

To find used, secondhand, rare, hard-to-find and out-of-print books around the world:

• abebooks http://www.abebooks.com/

• Virtual Book Shophttp://www.bookshop.com/

***-Examples

Page 238: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

238

Free public access bibliographic book database + price comparisons

Free public access bibliographic book database + price comparisons

• Even comparisons of the catalogues of shops of books (as well as of music, movies and many other goods) are available free of charge.

• See for instance

»http://www.bookfinder.com/

»http://www.dealtime.com/

****

Page 239: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

239

Example of an international public access dissertation database

Example of an international public access dissertation database

• The dissertation database of UMI is available from: http://wwwlib.umi.com/dissertations/

• The most current two years are available without charge.

***-Examples

Page 240: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

240

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Search for titles of bookswhich are relevant for you,

using an online database provided by a book publisher or bookshop.

Search for titles of bookswhich are relevant for you,

using an online database provided by a book publisher or bookshop.

****

Page 241: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

241

Public access book databases: evaluation criteria (Part 1)Public access book databases: evaluation criteria (Part 1)

• Is usage free of charge?

• Wide coverage? Also for books in your preferred language?

• Specialized coverage for particular subjects?

• Up to date? Frequent updates?

• Abstracts, summaries, descriptions, tables of contents included?

• Full text indexing of each item in the database, so that full text searching is possible?

***-

Page 242: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

242

Public access book databases: evaluation criteria (Part 2)Public access book databases: evaluation criteria (Part 2)

• Field indexing, so that searching for the contents of a particular field is possible? for instance

»the title

»the date of publication

»the author

»the publisher

»the language

***-

Page 243: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

243

Public access book databases: evaluation criteria (Part 3)Public access book databases: evaluation criteria (Part 3)

• Does the database producer improve retrieval by

»adding subject terms, or

»by classifying the books in categories

• Powerful search options:

» truncation? stemming?

»Boolean search combinations? proximity searching,…?

»spelling check of your search terms?

» translation of your search terms in several other languages?

***-

Page 244: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

244

Public access book databases: evaluation criteria (Part 4)Public access book databases: evaluation criteria (Part 4)

• Easy user interface?

• Is a user interface offered in your own language?

• Relevance ranking of results?

• Possibility to combine Boolean retrieval with relevance ranking of results?

• Can results be limited to a certain time period?

• Can the results be ordered according to date, size, origin,...?

***-

Page 245: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

245

Public access book databases: evaluation criteria (Part 5)Public access book databases: evaluation criteria (Part 5)

• Good presentation of each result?

• Does the system offer a current awareness service, sending information on new titles that may be of interest to you?

• Short response times?

***-

Page 246: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

246

Public access book databases: evaluation criteria (Part 6)Public access book databases: evaluation criteria (Part 6)

• Are other services offered from the same site or with the same interface? Is the system integrated with other services?Additional services can be

»searchable databases of videos, of music CD’s, CD-ROMs, DVDs, all for sale also

»a subject directory for browsing, besides the database with index for searching

»WWW-based e-mail and e-mail address directories

»auctions through WWW

***-

Page 247: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

247

Online access information sources and services

Online access information sources and services

Library Online Public Access Catalogues

= OPACs

****

Page 248: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

248

Online Public Access Catalogues of libraries

Online Public Access Catalogues of libraries

****

• Mainly to find older books, the catalogues of libraries can be useful.

• Most are accessible online and free of charge.

Page 249: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

249

Online Public Access Catalogues = OPACs: definition

Online Public Access Catalogues = OPACs: definition

***-

Online Public Access Catalogue:

a term used to describe any type of computerized library catalog offered to the public by online login

Page 250: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

250

Online access library catalogues:The British Library

Online access library catalogues:The British Library

• Accessible online via WWW: Since 2000: http://blpc.bl.uk/

• Access free of charge

***-Example

Page 251: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

251

Online access library catalogues:The British Library: screenshotOnline access library catalogues:The British Library: screenshot

***-Example

Page 252: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

252

Online access information sources and services

Online access information sources and services

Fee-based online public access information services

****

Page 253: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

253

Types of online access information systems: “free” versus “fee”

• A lot of the information on the Internet is available free of charge, but another part is only accessible when a fee is paid to the producer and / or the distributor.

• Some organisations pay these fees for some sources and then organise access, so that the members of the organisation can retrieve and exploit the information as if it is free of charge.

• The first commercial computer systems that make information available online were born around 1975.

• Most of them are now also available through the Internet.

****

Page 254: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

254

Fee-based online access services: examples (Part 1)

Fee-based online access services: examples (Part 1)

Location of the computer(s)

U.S.A.U.S.A.U.S.A.U.S.A.U.S.A., Taiwan, UKSwitzerlandU.S.A.U.S.A.

Name

America On LineOCLCOvid TechnologiesCompuServeCambridgeData-StarDialogEBSCO

***-Examples

Page 255: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

255

Fee-based online access services: examples (Part 2)

Fee-based online access services: examples (Part 2)

Location of the computer(s)

U.S.A.

U.S.A.U.S.A.U.S.A., The Netherlands,...Germany - U.S.A. - JapanThe Netherlands...

Name

Elsevier ScienceDirect FactivaISI (Web of Science, JCR,…)LexisNexisMSN (Microsoft)ProdigySilver PlatterSTN Swets (e-journals)...

***-Examples

Page 256: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

256

Online information services: various names for similar systems

Online information services: various names for similar systems

• (fee-based) online (access) information service

• (fee-based) online (access) computer service

• databank

• database vendor

• host computer

• aggregator

• ...

***-

Page 257: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

257

Online information services:total size of their databases

Online information services:total size of their databases

In 1999:

The big host systems and the public access WWW pages offer a comparable quantity of information:

• WWW offered about 8 terabytes (= 8 000 gigabytes) of text data

(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)

• Dialog offered about 9 terabytes (= 9 000 gigabytes) (in 1998)

»6 billion pages of text

»3 million images

****

Page 258: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

258

Database hosts / distributors:evaluation criteria (Part 1)Database hosts / distributors:evaluation criteria (Part 1)

• Contract required?

• A priori payment required?

• Stability / history / evolution / future of host?

• Low costs of data communication?

• Many databases available?

• Whole records available (or only parts)?

• Frequent updates?

• Whole database available? As one file or fragmented?

***-

Page 259: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

259

Database hosts / distributors:evaluation criteria (Part 2)Database hosts / distributors:evaluation criteria (Part 2)

• Price of access? Price of information?

• Powerful search options: truncation, Boolean combinations, proximity searching,…?

• Can the indexes of more than one database be searched simultaneously?

• Speed of retrieval?

• Relevance ranking of results?

• Fast response? Accuracy of data communication?

• Clear output format?

***-

Page 260: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

260

Database hosts / distributors:evaluation criteria (Part 3)Database hosts / distributors:evaluation criteria (Part 3)

• Online indication of costs?

• Easy user interface?

• Practice free of charge?

• Good manuals, documentation and online help?

• Training courses available? Quality?

• Good help desk available?

• Gateway service offered?

• ...

***-

Page 261: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

261

Databases of online public access databases

Databases of online public access databases

• Example

»Gale directory of databases !

• Their coverage:

»online access databases

»(databases accessible on CD-ROM)

»...

***-

Page 262: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

262

Databases of databases: Gale

Databases of databases: Gale

• Produced in U.S.A.

• Not free of charge

• Available in various formats:

»printed

»on CD-ROM

»online via the host systems Data-Star, Dialog, with a payment required for each use

»online through the Internet through various hosts,for a fixed price per year to be paid in advance

***-

Page 263: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

263

Online access information sources and services

Online access information sources and services

Online access databases about journal articles

****

Page 264: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

264

Online access databases about journal articles: overview

Online access databases about journal articles: overview

• Thousands of fee-based online access databases offer bibliographies or full-texts of journal articles in particular subject domains and published by many publishers.

• Many publishers offer searchable bibliographies, but only of their own publications. (for instance Emerald, Elsevier)

• Only few large databases offer access to bibliographies of articles published in journals from many publishers, free of charge.

****

Page 265: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

265

Online access databases about journal articles: Ingenta (1)

Online access databases about journal articles: Ingenta (1)

• Ingenta Journals allows you to search a bibliographic database of millions of journal articles, including titles, authors, in many cases abstracts.

• Searching is free of charge.

***-Example

Page 266: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

266

Online access databases about journal articles: Ingenta (2)

Online access databases about journal articles: Ingenta (2)

• Payment is required to receive the full text of an article.

• Ingenta has acquired Uncover in 2000.

• Available from

»http://www.ingenta.co.uk/

»http://www.ingenta.com/

***-Example

Page 267: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

267

Online access databases about journal articles: Article@INIST

Online access databases about journal articles: Article@INIST

• Article@INIST allows you to search in a bibliographic database, NOT full-text (Journal articles, Journal issues, Books, Reports or Conferences, doctoral dissertations) at the Institut de l'Information Scientifique et Technique, France.

• Searching is free of charge.

• Available fromhttp://form.inist.fr/public/eng/conslt.htm

• Payment is required to receive the full text of an article.

****Example

Page 268: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

268

Online access databases about journal articles: Infotrieve

Online access databases about journal articles: Infotrieve

• Infotrieve allows you to search free of charge in a bibliographic database of the articles of more than 20 000 journal titles and conference proceedings, NOT full-text.

• Available from http://www3.infotrieve.com/

• Payment is required to receive the full text of a document.

• Current awareness services are also offered free of charge: the table of contents of new issues of the journals that you have selected are sent to you by email.

***-Example

Page 269: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

269

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Search for titles of journal articleswhich are relevant for you,

in a database provided free of charge.

Search for titles of journal articleswhich are relevant for you,

in a database provided free of charge.

***-

Page 270: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

270

Online access information sources and services

Online access information sources and services

Electronic newsletters and journals

***-

Page 271: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

271

Electronic newsletters and journals: introduction

Electronic newsletters and journals: introduction

***-

• Since the end of the 1990s, electronic journals have become a new communication medium that cannot be neglected.

Author / Sender Editor Reader / Receiver

Page 272: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

272

Online access information sources and services

Online access information sources and services

Conclusion

***-

Page 273: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

273

Online access information: future trends

• An increasing amount of information becomes available online.

• A growing amount of this online information becomes available free of charge.

• The quality of server and client software is growing.

A consequence is:

• An increasing number of end-users searching for information online.

****

Page 274: 1 Information retrieval Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Information- and Library Science, University of Antwerp(en), Belgium Lectures.

274

Online access information: conclusion

• In the case of simple information needs, the WWW and the search tools can work like “magic”.

• However, in the case of more complicated information needs, there is still is no “magic button” that brings you immediately to all the required information.

****