Top Banner
Text information storage and Text information storage and retrieval retrieval and the CDS/ISIS program and the CDS/ISIS program *** Paul NIEUWENHUYSEN Paul NIEUWENHUYSEN [email protected] [email protected] University Library, Vrije University Library, Vrije Universiteit Brussel, Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Pleinlaan 2, B-1050 Brussel, Belgium Belgium
96

Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN [email protected] University Library, Vrije Universiteit Brussel,

Dec 16, 2015

Download

Documents

Dina Parker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Text information storage and retrieval Text information storage and retrieval and the CDS/ISIS programand the CDS/ISIS program

Text information storage and retrieval Text information storage and retrieval and the CDS/ISIS programand the CDS/ISIS program

***

Paul NIEUWENHUYSEN Paul NIEUWENHUYSEN [email protected]@vub.ac.be

University Library, Vrije Universiteit Brussel, University Library, Vrije Universiteit Brussel,

Pleinlaan 2, B-1050 Brussel, BelgiumPleinlaan 2, B-1050 Brussel, Belgium

Page 2: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

What is a What is a database?database?What is a What is a database?database?

• A database is a collection of similar data records stored in A database is a collection of similar data records stored in a common file (or collection of files).a common file (or collection of files).

***

Page 3: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Software type =Software type =information retrieval softwareinformation retrieval software

Software type =Software type =information retrieval softwareinformation retrieval software

• Software for Software for information storage and retrieval information storage and retrieval

(ISR software)(ISR software)

• Text(-oriented) database management systems Text(-oriented) database management systems

(Text-DBMS)(Text-DBMS)

• Text information management systemsText information management systems

(TIMS)(TIMS)

• Document retrieval systemsDocument retrieval systems

• Document management systemsDocument management systems

***

Page 4: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval: Information retrieval: via a database to the uservia a database to the user

Information retrieval: Information retrieval: via a database to the uservia a database to the user

***

Informationcontent

Informationcontent

Linear file Inverted file

Search engine

Search interface UserUser

Database

Page 5: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Comparison

Information retrieval: Information retrieval: the basic processes in search systemsthe basic processes in search systems

Information retrieval: Information retrieval: the basic processes in search systemsthe basic processes in search systems

Information problem

Representation

Query Indexed documents

Representation

Retrieved documents

Text documents

Evaluation and

feedback

***

Page 6: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval systems: Information retrieval systems: many components make up a systemmany components make up a system

Information retrieval systems: Information retrieval systems: many components make up a systemmany components make up a system

• Any retrieval system is built up of many more or less Any retrieval system is built up of many more or less independent components.independent components.

• These components can be modified to increase the quality These components can be modified to increase the quality of the results more or less independently.of the results more or less independently.

***

Page 7: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval systems: Information retrieval systems: important componentsimportant components

Information retrieval systems: Information retrieval systems: important componentsimportant components

***

the information content

system to describe formal aspects of information items

system to describe the subjects of information items

concrete descriptions of information items = application of the used information description systems

information storage and retrieval computer program(s)

computer system used for retrieval

type of medium or information carrier used for distribution

Page 8: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval systems: Information retrieval systems: the information contentthe information content

Information retrieval systems: Information retrieval systems: the information contentthe information content

• The information content is the information that is created The information content is the information that is created or gathered by the producer.or gathered by the producer.

• The information content is independent of software and The information content is independent of software and of distribution media.of distribution media.

• The information content is input into the retrieval system The information content is input into the retrieval system usingusing

» a system (rules) to describe the formal aspectsa system (rules) to describe the formal aspects

» a system (rules) to describe the contents a system (rules) to describe the contents (classification, thesaurus,...)(classification, thesaurus,...)

***

Page 9: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval systems: Information retrieval systems: media used for distributionmedia used for distribution

Information retrieval systems: Information retrieval systems: media used for distributionmedia used for distribution

• Hard copy Hard copy (for information retrieval systems only in the broad sense)(for information retrieval systems only in the broad sense)

» PrintPrint

» MicroficheMicrofiche

• For computers: For computers: (for information retrieval systems (for information retrieval systems strictu sensustrictu sensu))

» Magnetic tapeMagnetic tape

» Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...)Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...)

» OnlineOnline

***

Page 10: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval systems: Information retrieval systems: the computer programthe computer program

Information retrieval systems: Information retrieval systems: the computer programthe computer program

The information retrieval program consists of several The information retrieval program consists of several modules, including:modules, including:

• The module that allows the creation of the The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s).inverted file(s) = index file(s) = dictionary file(s).

• The search engine provides the search features and power The search engine provides the search features and power that allow the inverted file(s) to be searched.that allow the inverted file(s) to be searched.

• The interface between the system and the user determines The interface between the system and the user determines how they (can) interact to search the database (using how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).menus and/or icons and/or templates and/or commands).

***

Page 11: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

What determines the results of a What determines the results of a search in a retrieval system?search in a retrieval system?

What determines the results of a What determines the results of a search in a retrieval system?search in a retrieval system?

• the information retrieval system the information retrieval system ( = contents + system)( = contents + system)

• the user of the retrieval system the user of the retrieval system and the search strategy applied to the systemand the search strategy applied to the system

***

Result of a searchResult of a search

Page 12: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Characteristics / definition of Characteristics / definition of structured text-informationstructured text-information

Characteristics / definition of Characteristics / definition of structured text-informationstructured text-information

• The text information is structured.The text information is structured.(files, records, fields, sub-fields, links/relations among (files, records, fields, sub-fields, links/relations among records,...)records,...)

• The length of records and fields can be “long”.The length of records and fields can be “long”.

• Some fields are multi-valued, Some fields are multi-valued, i.e. they occur more than once.i.e. they occur more than once.

***

Page 13: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Layered structure Layered structure of a databaseof a database

Layered structure Layered structure of a databaseof a database

Database

File

Records

Fields

Characters

+ in many systems:relations / links

between records

***

Page 14: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of Structure of a bibliographic filea bibliographic file

Structure of Structure of a bibliographic filea bibliographic file

Record No. 1 Title Author 1: name + first name Author 2: ... Source Descriptor 1 Descriptor 2 ...

Record No. 2

Sub-fields

Repeated fields

***

Page 15: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus: Thesaurus: descriptiondescriptionThesaurus: Thesaurus: descriptiondescription

• Thesaurus = Thesaurus =

» system to control a vocabulary + system to control a vocabulary +

» the contents of this vocabularythe contents of this vocabulary

• Thesaurus program = Thesaurus program =

program to create, manage, modify and/or search a program to create, manage, modify and/or search a thesaurus using a computerthesaurus using a computer

***

Page 16: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus Thesaurus relationsrelations

Thesaurus Thesaurus relationsrelations

Term(s) with broader meaning

BT (= Broader Term)

RT (Related Term) UF (= Use For)Other term(s) Term Synonym(s)

NT (= Narrower Term)

Term(s) with narrower meaning

***

Page 17: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus Thesaurus applicationsapplicationsThesaurus Thesaurus applicationsapplications

• To find/choose index terms to add these to items, To find/choose index terms to add these to items, when terms are taken from a controlled vocabularywhen terms are taken from a controlled vocabulary

• To find more and/or better terms to search a database To find more and/or better terms to search a database (to increase recall and precision)(to increase recall and precision)

• To find more and/or better terms during writingTo find more and/or better terms during writing

• To understand the meaning of a term, by inspecting To understand the meaning of a term, by inspecting

» the scope note of the term and/or the scope note of the term and/or

» the relations with other termsthe relations with other terms

***

Page 18: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Database systems: Database systems: why study this subject briefly ?why study this subject briefly ?

Database systems: Database systems: why study this subject briefly ?why study this subject briefly ?

• To achieve a better understanding of the inner workings To achieve a better understanding of the inner workings of the external information retrieval systems that you use, of the external information retrieval systems that you use, so that you can exploit these more efficientlyso that you can exploit these more efficiently

• To be able to evaluate the quality of database systems you To be able to evaluate the quality of database systems you are confronted with, so that you canare confronted with, so that you can

» make better choices among available systems, make better choices among available systems,

» offer constructive suggestions to the manager,offer constructive suggestions to the manager,

» ......

***

Page 19: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Database systems: Database systems: why study this subject in detail?why study this subject in detail?

Database systems: Database systems: why study this subject in detail?why study this subject in detail?

To acquire the knowledge and skills to create / set up / To acquire the knowledge and skills to create / set up / manage your own local database system on a computermanage your own local database system on a computer

**-

Page 20: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Database systems: Database systems: definitiondefinition

Database systems: Database systems: definitiondefinition

A database (management) system is a program or set of A database (management) system is a program or set of programs, providing a means by which a user can easily programs, providing a means by which a user can easily store and retrieve data in the form of “databases”.store and retrieve data in the form of “databases”.

***

Page 21: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval software: Information retrieval software: related termsrelated terms

Information retrieval software: Information retrieval software: related termsrelated terms

• Software for Software for information storage and retrieval information storage and retrieval

(ISR software)(ISR software)

• Text(-oriented) database management systems Text(-oriented) database management systems

(Text-DBMS)(Text-DBMS)

• Text information management systemsText information management systems

(TIMS)(TIMS)

• Document retrieval systemsDocument retrieval systems

• Document management systemsDocument management systems

**-

Page 22: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval software: Information retrieval software: applications (Part 1)applications (Part 1)

Information retrieval software: Information retrieval software: applications (Part 1)applications (Part 1)

Documents

Archived documents

Books / Documents

Objects / Books / ...

Patient’s histories

Clients / Potential clients

Courses / Teachers

Publications / ...

• Documentation centresDocumentation centres

• ArchivesArchives

• LibrariesLibraries

• MuseaMusea

• Medical filesMedical files

• Marketing departmentsMarketing departments

• SchoolsSchools

• Bibliographic databasesBibliographic databases

**-

Page 23: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Information retrieval software: Information retrieval software: applications (Part 2)applications (Part 2)

Information retrieval software: Information retrieval software: applications (Part 2)applications (Part 2)

• Meeting calendarsMeeting calendars

• Product informationProduct information

• LaboratoriesLaboratories

• Personal documentationPersonal documentation

• Patent officePatent office

• Co-operating Co-operating information networksinformation networks

• ......

Meetings = conferences

Product descriptions

Recipes

Documents

Patents

Documents / Persons / Institutes / Events / ...

**-

Page 24: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Cataloguing: Cataloguing: hard copy versus computer-based hard copy versus computer-based

Cataloguing: Cataloguing: hard copy versus computer-based hard copy versus computer-based

• Hard copyHard copy

» ““Input” , i.e. cataloguing, on cards determines directly the Input” , i.e. cataloguing, on cards determines directly the “ouput”, i.e. the format of the data on the card as presented “ouput”, i.e. the format of the data on the card as presented to the userto the user

» Summarized: INPUT=OUTPUTSummarized: INPUT=OUTPUT

• Computer-basedComputer-based

» Input in the database in fields allows later output in Input in the database in fields allows later output in various formats for presentationvarious formats for presentation

» Summarized: 1. INPUT, 2. various OUTPUTsSummarized: 1. INPUT, 2. various OUTPUTs

**-

Page 25: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Text-information management Text-information management systems: characteristics and definition systems: characteristics and definition

Text-information management Text-information management systems: characteristics and definition systems: characteristics and definition

The information in the database is text oriented.The information in the database is text oriented.Therefore, several features are required:Therefore, several features are required:

» ability to store relatively long blocks of textsability to store relatively long blocks of texts

» ability to retrieve items in which specific words or terms ability to retrieve items in which specific words or terms occur anywhereoccur anywhere

***

Page 26: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Text-information management: Text-information management: from free-form to structure from free-form to structure

Text-information management: Text-information management: from free-form to structure from free-form to structure

Free form text information without structureFree form text information without structure

Text database with information structured Text database with information structured in files, records, fields, sub-fields, in files, records, fields, sub-fields,

with links/relations among records,...with links/relations among records,...(Ideally, each fields is repeatable (Ideally, each fields is repeatable

= can be multi-valued, = can be multi-valued, = can occur more than once in each record.)= can occur more than once in each record.)

**-

Page 27: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Text-information management: Text-information management: types of software types of software

Text-information management: Text-information management: types of software types of software

Software typeSoftware type

• Word processing softwareWord processing software

• Free-form or structured Free-form or structured text information database text information database softwaresoftware

***

FeaturesFeatures

• Must be learnt anyway.Must be learnt anyway.Slow sequential searching.Slow sequential searching.

• Additional software to be Additional software to be purchased and learnt.purchased and learnt.Fast searching via Fast searching via index(es).index(es).

Page 28: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Advantages of structuredAdvantages of structuredtext-retrieval versus X-base systemstext-retrieval versus X-base systems

Advantages of structuredAdvantages of structuredtext-retrieval versus X-base systemstext-retrieval versus X-base systems

FeatureFeature

• Many long fields, forming long recordsMany long fields, forming long records

• Repeatable fieldsRepeatable fields

• SubfieldsSubfields

• Variable field lengthsVariable field lengths

• Fast searching any word in all fieldsFast searching any word in all fields

• Thesaurus to help searchingThesaurus to help searching

Text-retrieval

Yes

Yes

Yes

Yes

Yes

Yes

X-base systems

No

No

No

No

No

No

**-

Page 29: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Hierarchy Hierarchy in the use of a databasein the use of a database

Hierarchy Hierarchy in the use of a databasein the use of a database

Database structure

Input / Editing

Searching / Output

***

Page 30: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Functions of Functions of database management softwaredatabase management software

Functions of Functions of database management softwaredatabase management software

• Input / edit using keyboard or batch inputInput / edit using keyboard or batch input

• Indexing of the database(s) Indexing of the database(s)

• Browse / Search / Select / Retrieve data from databaseBrowse / Search / Select / Retrieve data from database

• Output Output (Sort / Display / Print to file / Print to paper) (Sort / Display / Print to file / Print to paper)

++

• Export / ImportExport / Import

***

Page 31: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which advantages offers a document management system

on computer?

Which advantages offers a document management system

on computer?

***

Page 32: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Advantages of a document system on Advantages of a document system on computer, for the user(s)computer, for the user(s)

Advantages of a document system on Advantages of a document system on computer, for the user(s)computer, for the user(s)

Access to information is easier.Access to information is easier.

Access to information is faster.Access to information is faster.

Online access is possible even when centre is closed.Online access is possible even when centre is closed.

Online access is possible from a distance.Online access is possible from a distance.

Integration in search module with data on loan status.Integration in search module with data on loan status.

More elements of the records can serve as search term.More elements of the records can serve as search term.

Combinations of search terms can be used.Combinations of search terms can be used.

Results /selections can be stored as computer files.Results /selections can be stored as computer files.

***

Page 33: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

The CDS/ISIS text database The CDS/ISIS text database management programmanagement program

The CDS/ISIS text database The CDS/ISIS text database management programmanagement program

• Software to create and manage local, in-house databases Software to create and manage local, in-house databases with primarily structured text as contents with primarily structured text as contents (NOT numbers, graphics, sound,...)(NOT numbers, graphics, sound,...)

• Versions available forVersions available for

» MainframesMainframes (IBM)(IBM)

» Minicomputers Minicomputers (Digital VAX)(Digital VAX)

» Microcomputers Microcomputers (DOS )(DOS )

**-

Page 34: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Micro-CDS/ISIS: original main menu Micro-CDS/ISIS: original main menu on the displayon the display

Micro-CDS/ISIS: original main menu Micro-CDS/ISIS: original main menu on the displayon the display

*--

Page 35: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS database definition CDS/ISIS database definition services: display menuservices: display menu

CDS/ISIS database definition CDS/ISIS database definition services: display menuservices: display menu

*--

Page 36: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS database definition table: CDS/ISIS database definition table: display of an exampledisplay of an example

CDS/ISIS database definition table: CDS/ISIS database definition table: display of an exampledisplay of an example

*--

Page 37: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS manual data entry, CDS/ISIS manual data entry, editing / input services: display menuediting / input services: display menu

CDS/ISIS manual data entry, CDS/ISIS manual data entry, editing / input services: display menuediting / input services: display menu

*--

Page 38: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Batch input / Batch input / ImportImport

Batch input / Batch input / ImportImport

• Is batch input possible?Is batch input possible?

• Is a format conversion program included or available?Is a format conversion program included or available?

• ......

**-

Page 39: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Activities related to Activities related to indexingindexing

Activities related to Activities related to indexingindexing

Activity•Intellectual, human indexing•Develop an automatic indexing method•Automatic indexing

Who does it?Database producer / Thesaurus producer

Database producer / Software features

Computer with program

Concrete actionAttribute subject terms to recordsMaking an index method file

Making inverted file(s)

**-

Page 40: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Indexes in books and databases: Indexes in books and databases: a comparisona comparison

Indexes in books and databases: Indexes in books and databases: a comparisona comparison

Invisible

PrintedIndex_term_1 page x1, y1, z1,...Index_term_2 page x2, y2, z2,......

Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1record nr. y1 / field type nr. y1 / field occurrence x1 / position y1...

Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2record nr. x2 / field type nr. x2 / field occurrence x2 / position x2...

...

BookBook

DatabaseDatabase

**-

Page 41: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Index in a text retrieval system Index in a text retrieval system (such as CDS/ISIS)(such as CDS/ISIS)

Index in a text retrieval system Index in a text retrieval system (such as CDS/ISIS)(such as CDS/ISIS)

Terminology: Index = Inverted file = Dictionary

database dictionary on display

database complete inverted file

**-

Page 42: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Methods of Methods of inverted file creationinverted file creation

Methods of Methods of inverted file creationinverted file creation

Word indexingSimple / automatic / no indication requiredLoss of word contextA field structure is not required Phrase indexingIndication of phrases during input is requiredRicher than separate wordsA field structure is not required Field indexingSimple / automatic / no indication requiredContext is better preservedA field structure is required

**-

Page 43: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS inverted file services: CDS/ISIS inverted file services: display menudisplay menu

CDS/ISIS inverted file services: CDS/ISIS inverted file services: display menudisplay menu

*--

Page 44: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Automatic indexing Automatic indexing (file inversion)(file inversion)

Automatic indexing Automatic indexing (file inversion)(file inversion)

Possible?Obligatory?

**-

• Word indexing? with proximity indexing?Word indexing? with proximity indexing?• Field indexing?Field indexing?• Sub-field indexing?Sub-field indexing?• Phrase indexing?Phrase indexing?

Maximum length of index entry?Maximum length of index entry? List of stopwords available?List of stopwords available? Immediately after input or in batch? (Slow down...?)Immediately after input or in batch? (Slow down...?) Indexing speed?Indexing speed? Adding prefixes/tags possible?Adding prefixes/tags possible? Modification of indexing possible?Modification of indexing possible?

Page 45: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Why can the index of a database be so large

in comparison with the size of the database?

Why can the index of a database be so large

in comparison with the size of the database?

**-

Page 46: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS information retrieval CDS/ISIS information retrieval services: display menuservices: display menu

CDS/ISIS information retrieval CDS/ISIS information retrieval services: display menuservices: display menu

*--

Page 47: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS information retrieval: CDS/ISIS information retrieval: example of a dictionary on the displayexample of a dictionary on the display

CDS/ISIS information retrieval: CDS/ISIS information retrieval: example of a dictionary on the displayexample of a dictionary on the display

*--

Page 48: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Output from a database Output from a database to various “devices”to various “devices”

Output from a database Output from a database to various “devices”to various “devices”

• to video displayto video display

• to printerto printer

• to computer file to computer file (“printing” to a file)(“printing” to a file)

**-

Page 49: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

CDS/ISIS output (sorting and CDS/ISIS output (sorting and printing) services: display menuprinting) services: display menuCDS/ISIS output (sorting and CDS/ISIS output (sorting and

printing) services: display menuprinting) services: display menu

*--

Page 50: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Formatting of data Formatting of data within each record in outputwithin each record in output

Formatting of data Formatting of data within each record in outputwithin each record in output

• Independent of output device:Independent of output device:

» Determine the sequence of the fields in each record.Determine the sequence of the fields in each record.

» Omit specific fields from each record.Omit specific fields from each record.

» Add field names or tags to the fields in each record.Add field names or tags to the fields in each record.

» Indicate the search term(s) in each record.Indicate the search term(s) in each record.

• Dependent of output device:Dependent of output device:

» Specify character formats in each (sub)field: Specify character formats in each (sub)field:

typefacetypeface + + size size + + boldbold/italic//italic/underlineunderline

**-

Page 51: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Sorting / arranging of records Sorting / arranging of records in the whole outputin the whole output

Sorting / arranging of records Sorting / arranging of records in the whole outputin the whole output

• Can the user determine the sequence of the records?Can the user determine the sequence of the records?

• Which elements can be used as a basis for sorting?Which elements can be used as a basis for sorting?

• Can stopwords be omitted as a basis for sorting?Can stopwords be omitted as a basis for sorting?

• What is the maximum number of sort levels?What is the maximum number of sort levels?

• Can the user choose between ascending or descending Can the user choose between ascending or descending order?order?

• Can duplicate records be eliminated? (If yes: Can the Can duplicate records be eliminated? (If yes: Can the user determine the meaning of duplicate?)user determine the meaning of duplicate?)

• Can output formats (styles) be stored?Can output formats (styles) be stored?

**-

Page 52: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus program module: Thesaurus program module: purposepurpose

Thesaurus program module: Thesaurus program module: purposepurpose

• Does the database management program offer a Does the database management program offer a thesaurus module which allows the user to create, modify, thesaurus module which allows the user to create, modify, store, and delete relations between terms used in the store, and delete relations between terms used in the database?database?

• This is mainly used to establish relations among This is mainly used to establish relations among controlled subject indexing terms. controlled subject indexing terms.

• If more than one controlled vocabulary is used, these If more than one controlled vocabulary is used, these should be managed separately.should be managed separately.

**-

Page 53: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of a thesaurus database Structure of a thesaurus database record (Fields for “good” terms)record (Fields for “good” terms)

Structure of a thesaurus database Structure of a thesaurus database record (Fields for “good” terms)record (Fields for “good” terms)

• ““Good” termGood” term

• Controlled vocabulary to which the term belongs Controlled vocabulary to which the term belongs (if more than 1 is used in the same database)(if more than 1 is used in the same database)

• Scope note (= definition of the controlled term)Scope note (= definition of the controlled term)

• Date of creation or modification of the termDate of creation or modification of the term

• NotesNotes

**-

Page 54: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of a thesaurus database Structure of a thesaurus database record (Fields for relations)record (Fields for relations)

Structure of a thesaurus database Structure of a thesaurus database record (Fields for relations)record (Fields for relations)

• BT (= broader term) BT (= broader term) term(s) with broader meaningterm(s) with broader meaning

• TT (= top term) TT (= top term) term highest in the hierarchyterm highest in the hierarchy

• NT (= narrower term) NT (= narrower term) term(s) with narrower meaningterm(s) with narrower meaning

• RT (= related term) RT (= related term) other term(s) related to this oneother term(s) related to this one

• UF (= use for) UF (= use for) synonym(s)synonym(s)

**-

Page 55: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of a thesaurus database Structure of a thesaurus database record (Fields for forbidden terms)record (Fields for forbidden terms)Structure of a thesaurus database Structure of a thesaurus database

record (Fields for forbidden terms)record (Fields for forbidden terms)

• Forbidden termForbidden term

• US (= use instead) US (= use instead) “good” term in the controlled “good” term in the controlled vocabularyvocabulary

**-

Page 56: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of a thesaurus database Structure of a thesaurus database record (Fields for candidate terms)record (Fields for candidate terms)Structure of a thesaurus database Structure of a thesaurus database

record (Fields for candidate terms)record (Fields for candidate terms)

• Candidate “good” term in the controlled vocabularyCandidate “good” term in the controlled vocabulary

• (Other fields as in the case of “good” terms)(Other fields as in the case of “good” terms)

**-

Page 57: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Structure of a multilingual thesaurus Structure of a multilingual thesaurus database recorddatabase record

Structure of a multilingual thesaurus Structure of a multilingual thesaurus database recorddatabase record

Each type of field in a thesaurus record occurs for each Each type of field in a thesaurus record occurs for each language.language.

**-

Page 58: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus program: Thesaurus program: desirable properties (Part 1)desirable properties (Part 1)

Thesaurus program: Thesaurus program: desirable properties (Part 1)desirable properties (Part 1)

• Multilingual user interface Multilingual user interface = menus and messages in more than 1 language= menus and messages in more than 1 language

• Multilingual contents = terms in more than 1 languageMultilingual contents = terms in more than 1 language

• When a term in the thesaurus database is added, changed When a term in the thesaurus database is added, changed or deleted, the program automatically makes the or deleted, the program automatically makes the corresponding changes throughout the whole thesaurus corresponding changes throughout the whole thesaurus database, there where that term occursdatabase, there where that term occurs

• The program controls the creation of The program controls the creation of impossible (= forbidden) or undesirable relationsimpossible (= forbidden) or undesirable relations

**-

Page 59: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus program: Thesaurus program: desirable properties (Part 2)desirable properties (Part 2)

Thesaurus program: Thesaurus program: desirable properties (Part 2)desirable properties (Part 2)

• Can the thesaurus contents be formatted and printed or Can the thesaurus contents be formatted and printed or sent to file?sent to file?

• Can more than 1 thesaurus be managed, linked to the Can more than 1 thesaurus be managed, linked to the same database?same database?

• Can a thesaurus database can be used with more than 1 Can a thesaurus database can be used with more than 1 primary database?primary database?

• Can the program signal the presence of orphan terms Can the program signal the presence of orphan terms (= terms without relation)?(= terms without relation)?

**-

Page 60: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus program: integration with Thesaurus program: integration with input/editing of the primary databaseinput/editing of the primary databaseThesaurus program: integration with Thesaurus program: integration with input/editing of the primary databaseinput/editing of the primary database

How simply and quickly can the user How simply and quickly can the user

» search the thesaurus during manual input/editing? search the thesaurus during manual input/editing? (for instance to use it as an authority list)(for instance to use it as an authority list)

» copy a term from a thesaurus and paste into a database copy a term from a thesaurus and paste into a database record?record?

» copy a term from the database and paste into a thesaurus?copy a term from the database and paste into a thesaurus?

» ......

**-

Page 61: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Thesaurus program: integration with Thesaurus program: integration with searching of the primary databasesearching of the primary database

Thesaurus program: integration with Thesaurus program: integration with searching of the primary databasesearching of the primary database

• Can the user browse the thesaurus during a search in the Can the user browse the thesaurus during a search in the database?database?

• Can the program automatically formulate a query, when Can the program automatically formulate a query, when the user selects terms in the thesaurus module?the user selects terms in the thesaurus module?

• Does the program allow to include easily and quickly Does the program allow to include easily and quickly synonyms, narrower terms and broader terms in a synonyms, narrower terms and broader terms in a query?query?

• ......

**-

Page 62: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Automatic creation, deletion or Automatic creation, deletion or adaptation of the reciprocal relationadaptation of the reciprocal relation

Automatic creation, deletion or Automatic creation, deletion or adaptation of the reciprocal relationadaptation of the reciprocal relation

Does a change by the user of a relation in one record cause Does a change by the user of a relation in one record cause an automatic change by the thesaurus program of the an automatic change by the thesaurus program of the reciprocal relation in the corresponding record of the reciprocal relation in the corresponding record of the thesaurus database? Examples:thesaurus database? Examples:

» change of BT changes NT in the corresponding recordchange of BT changes NT in the corresponding record

» change of NT changes BT in the corresponding recordchange of NT changes BT in the corresponding record

» change of RT changes RT in the corresponding recordchange of RT changes RT in the corresponding record

» change of UF changes US in the corresponding recordchange of UF changes US in the corresponding record

» change of US changes UF in the corresponding recordchange of US changes UF in the corresponding record

**-

Page 63: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Automatic control of the creation of Automatic control of the creation of impossible or undesirable relationsimpossible or undesirable relationsAutomatic control of the creation of Automatic control of the creation of impossible or undesirable relationsimpossible or undesirable relations

Does the thesaurus program avoid the creation of Does the thesaurus program avoid the creation of impossible or undesirable relations, or does it warn the impossible or undesirable relations, or does it warn the user? Examples of this kind of relations:user? Examples of this kind of relations:

» circular hierarchy (a NT b, b NT c, c NT a, or longer)circular hierarchy (a NT b, b NT c, c NT a, or longer)

» circular synonym relation (a UF b, b UF a)circular synonym relation (a UF b, b UF a)

» iterative synonym relations (a US b, b US c, or longer)iterative synonym relations (a US b, b US c, or longer)

» incomplete relations (a RT b, while b does not exist)incomplete relations (a RT b, while b does not exist)

» term related to itself (for instance: a NT a)term related to itself (for instance: a NT a)

» ......

**-

Page 64: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: propertiesfor CDS/ISIS: properties

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: propertiesfor CDS/ISIS: properties• It is an additional program in CDS/ISIS Pascal languageIt is an additional program in CDS/ISIS Pascal language

• Usage is free of charge, as in the case of CDS/ISISUsage is free of charge, as in the case of CDS/ISIS

• Thesaurus database management is based on CDS/ISISThesaurus database management is based on CDS/ISIS

• The thesaurus program, as well as CDS/ISIS, offers a The thesaurus program, as well as CDS/ISIS, offers a user interface in English, French, and Spanishuser interface in English, French, and Spanish

• The contents of a thesaurus database is trilingual : The contents of a thesaurus database is trilingual : each term in English, French, and Spanish each term in English, French, and Spanish (each one replaceable by another language)(each one replaceable by another language)

*--

Page 65: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: the relations among termsCDS/ISIS: the relations among terms

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: the relations among termsCDS/ISIS: the relations among terms

• The available relations are: US, UF, NT, BT, TT, RTThe available relations are: US, UF, NT, BT, TT, RT

• Unlimited number of occurrences for each type of Unlimited number of occurrences for each type of relations in each recordrelations in each record

• After a change of a relation, the program automatically After a change of a relation, the program automatically adapts the corresponding relation in the corresponding adapts the corresponding relation in the corresponding thesaurus term records thesaurus term records

*--

Page 66: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: control of relationsCDS/ISIS: control of relations

Trilingual thesaurus program for Trilingual thesaurus program for CDS/ISIS: control of relationsCDS/ISIS: control of relations

The program avoids the creation of some impossible or The program avoids the creation of some impossible or undesirable relations:undesirable relations:

» circular synonym relation circular synonym relation (a UF b, b UF a)(a UF b, b UF a)

» iterative synonym relations iterative synonym relations (a US b, b US c, or longer)(a US b, b US c, or longer)

» incomplete relations incomplete relations (a RT b, while b does not exist)(a RT b, while b does not exist)

*--

Page 67: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus for CDS/ISIS: Trilingual thesaurus for CDS/ISIS: integration with searchingintegration with searching

Trilingual thesaurus for CDS/ISIS: Trilingual thesaurus for CDS/ISIS: integration with searchingintegration with searching

• The user can browse the thesaurus during a search in the The user can browse the thesaurus during a search in the primary database.primary database.

• The program automatically formulates a query in the The program automatically formulates a query in the primary database, when the user selects terms in the primary database, when the user selects terms in the thesaurus module.thesaurus module.

• The program allows to include easily and quickly The program allows to include easily and quickly synonyms, narrower terms and broader terms in a query.synonyms, narrower terms and broader terms in a query.

• The thesaurus database can be used for searching with The thesaurus database can be used for searching with more than 1 primary database.more than 1 primary database.

*--

Page 68: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: further propertiesfor CDS/ISIS: further properties

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: further propertiesfor CDS/ISIS: further properties• In each record describing a term, a field for a scope note In each record describing a term, a field for a scope note

is present.is present.

• A field for date of term creation is present.A field for date of term creation is present.

• Several printout formats are included.Several printout formats are included.

*--

Page 69: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

How to obtain the trilingual thesaurus How to obtain the trilingual thesaurus program for CDS/ISIS?program for CDS/ISIS?

How to obtain the trilingual thesaurus How to obtain the trilingual thesaurus program for CDS/ISIS?program for CDS/ISIS?

• the national distributor in your countrythe national distributor in your country

• UNESCO Headquarters, UNESCO Headquarters, General Information Programme, 1 rue Miollis, Paris, General Information Programme, 1 rue Miollis, Paris, FranceFrance

• ......

*--

Page 70: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: conclusionsfor CDS/ISIS: conclusions

Trilingual thesaurus program module Trilingual thesaurus program module

for CDS/ISIS: conclusionsfor CDS/ISIS: conclusions- - NegativeNegative::

Not well integrated with the input/editing module of Not well integrated with the input/editing module of CDS/ISISCDS/ISIS

+ + PositivePositive::Exceptionally interesting price/quality ratioExceptionally interesting price/quality ratio

*--

Page 71: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Security / privacy / protectionSecurity / privacy / protectionof databasesof databases

Security / privacy / protectionSecurity / privacy / protectionof databasesof databases

• Password for searching Password for searching

specific database(s) and / or fields and / or recordspecific database(s) and / or fields and / or record

• Password for editing Password for editing

specific database(s) and / or fields and / or recordsspecific database(s) and / or fields and / or records

• Password for changing Password for changing

» database structuredatabase structure

» input and modification work sheetsinput and modification work sheets

» sort and print formats of data in recordssort and print formats of data in records

» sort and print formats of records in a selectionsort and print formats of records in a selection

**-

Page 72: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Security / privacy / protectionSecurity / privacy / protectionprovided by DOSprovided by DOS

Security / privacy / protectionSecurity / privacy / protectionprovided by DOSprovided by DOS

DOS can make filesDOS can make files

» read-onlyread-only

» hiddenhidden

*--

Page 73: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Security / privacy / protectionSecurity / privacy / protectionin CDS/ISISin CDS/ISIS

Security / privacy / protectionSecurity / privacy / protectionin CDS/ISISin CDS/ISIS

• SYSPAR.PAR file (entry 0) asks for a password, SYSPAR.PAR file (entry 0) asks for a password, which can limit access to a particular which can limit access to a particular

» database database

» set of worksheetsset of worksheets

» set of menusset of menus

» set of additional CDS/ISIS programsset of additional CDS/ISIS programs

• Using the read-only version, named ISISCD.EXE, Using the read-only version, named ISISCD.EXE, prevents modifications.prevents modifications.

• Menus can be changed or removed to prevent access.Menus can be changed or removed to prevent access.

*--

Page 74: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Passwords and Passwords and usage trackingusage trackingPasswords and Passwords and usage trackingusage tracking

• Does the use of passwords linked to users or user groups Does the use of passwords linked to users or user groups allow usage tracking by a systems manager?allow usage tracking by a systems manager?

““Usage” = for instance, number and types of search and/or Usage” = for instance, number and types of search and/or edit actions.edit actions.

• This can be useful for studies and system management.This can be useful for studies and system management.

**-

Page 75: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Data export Data export in the case of CDS/ISISin the case of CDS/ISIS

Data export Data export in the case of CDS/ISISin the case of CDS/ISIS

CDS/ISISDatabase Contents

Database structure

Other CDS/ISISuser with same databasestructure

Other CDS/ISISuser with same databasestructure

“Export”of data

OtherCDS/ISISuser withoutdatabase

OtherCDS/ISISuser withoutdatabase

Otherdatabasemanagementsystem

Otherdatabasemanagementsystem

“Print” data to file

Copy of all database files

*--

Page 76: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Manual versus batch import Manual versus batch import of data in a database of data in a database

Manual versus batch import Manual versus batch import of data in a database of data in a database

Information itemsInformation items

Manualinput

Batch input

**-

Page 77: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Conversion and batch input Conversion and batch input in the case of a CDS/ISIS databasein the case of a CDS/ISIS database

Conversion and batch input Conversion and batch input in the case of a CDS/ISIS databasein the case of a CDS/ISIS database

File with database records in ASCII with field tags

Fangorn program + Conversion specification file

File with records in format of the CDS/ISIS database

Import module in CDS/ISIS

Records in the CDS/ISIS database

*--

Page 78: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Format conversion programFormat conversion programFangornFangorn

Format conversion programFormat conversion programFangornFangorn

• Authors: Authors: Besemer and NieuwenhuysenBesemer and Nieuwenhuysen

• Available via anonymous ftp fromAvailable via anonymous ftp from

» PCWS1.SCI.SNS.ITPCWS1.SCI.SNS.IT

» ftp.vub.ac.be in the directory ftp.vub.ac.be in the directory \pub\projects\Docinfo\paul\cursus\isis\\pub\projects\Docinfo\paul\cursus\isis\

» ……

*--

Page 79: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Specification of a format conversion Specification of a format conversion in the case of Fangorn for CDS/ISISin the case of Fangorn for CDS/ISISSpecification of a format conversion Specification of a format conversion in the case of Fangorn for CDS/ISISin the case of Fangorn for CDS/ISIS

*--

Page 80: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which software packages for storage and retrieval of structured text

do YOU know?

Which software packages for storage and retrieval of structured text

do YOU know?

**-

Page 81: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Microcomputers software packages for Microcomputers software packages for for structured text retrieval: examplesfor structured text retrieval: examples

Microcomputers software packages for Microcomputers software packages for for structured text retrieval: examplesfor structured text retrieval: examples

**-Examples

• askSamaskSam

• Bib-SearchBib-Search

• CAIRSCAIRS

• Cardbox-PlusCardbox-Plus

• CDS / ISISCDS / ISIS

• HeadfastHeadfast

• IdeaListIdeaList

• InmagicInmagic

• Notes (Lotus / IBM)Notes (Lotus / IBM)

• Personal Librarian Personal Librarian

• Pro-CitePro-Cite

• Reference ManagerReference Manager

• StrixStrix

• STATUSSTATUS

• Topic (Verity)Topic (Verity)

• ......

Page 82: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

How can you use a word processing program

together with a text retrieval system?

How can you use a word processing program

together with a text retrieval system?

**-

Page 83: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Word processing program Word processing program to assist a retrieval programto assist a retrieval programWord processing program Word processing program

to assist a retrieval programto assist a retrieval program

To polish text data before import in the database managed by the retrieval program

To inspect output to printer before real printing

To accept output from the retrieval program for further and better formatting, followed by printing

**-

Page 84: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Which benefits offers a field structure

to databases?

Which benefits offers a field structure

to databases?

**-

Page 85: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Field structure in records: Field structure in records: benefits concerning inputbenefits concerning input

Field structure in records: Field structure in records: benefits concerning inputbenefits concerning input

• The indication of fields in input worksheets guides the The indication of fields in input worksheets guides the input.input.

• Default values can be assigned to fields which can avoid Default values can be assigned to fields which can avoid errors and can make input faster.errors and can make input faster.

• The existence of fields allows control of the contents The existence of fields allows control of the contents format of each specific field during input.format of each specific field during input.

• ......

**-

Page 86: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Field structure in records: Field structure in records: benefits concerning searchingbenefits concerning searching

Field structure in records: Field structure in records: benefits concerning searchingbenefits concerning searching

• User can limit search to specific fields.User can limit search to specific fields.

• Field type adds information to contents.Field type adds information to contents.

• Field-indexing keeps data together in index.Field-indexing keeps data together in index.

• ......

**-

Page 87: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Field structure in records: Field structure in records: benefits concerning outputbenefits concerning outputField structure in records: Field structure in records: benefits concerning outputbenefits concerning output

• Field structure makes output easier to understand.Field structure makes output easier to understand.

• In output, each field can be indicated with tag/prefix.In output, each field can be indicated with tag/prefix.

• Records can be sorted based on contents of a field.Records can be sorted based on contents of a field.

• In output, the fields can be sorted in each record.In output, the fields can be sorted in each record.

• In output, some fields can be omitted.In output, some fields can be omitted.

• ......

**-

Page 88: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?!? Question !? Task !? Problem !?

Besides all the benefits offered by a field structure in a database,

which problems does this cause?

Besides all the benefits offered by a field structure in a database,

which problems does this cause?

**-

Page 89: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Field structure in records: Field structure in records: problems problems (Part 1)(Part 1)

Field structure in records: Field structure in records: problems problems (Part 1)(Part 1)

• In the short term, it is more expensive and time In the short term, it is more expensive and time consuming, than handling less structured data.consuming, than handling less structured data.

• Initially, the database manager who wants to create a new Initially, the database manager who wants to create a new database has to make decisions: database has to make decisions:

» which fields to create to subdivide the database records, which fields to create to subdivide the database records,

» which field tags or names to use for the internal which field tags or names to use for the internal housekeeping of the database by the chosen database housekeeping of the database by the chosen database management software package.management software package.

**-

Page 90: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Field structure in records: Field structure in records: problems problems (Part 2)(Part 2)

Field structure in records: Field structure in records: problems problems (Part 2)(Part 2)

• The exchange of data, i.e. importing data in a database, The exchange of data, i.e. importing data in a database, which have been exported from another database, is which have been exported from another database, is hindered when the databases structures are not identical hindered when the databases structures are not identical or compatible.or compatible.

• ......

**-

Page 91: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Exchange formats and standards Exchange formats and standards for text database systemsfor text database systems

Exchange formats and standards Exchange formats and standards for text database systemsfor text database systems

• Usage and aims:Usage and aims:

» to allow efficient exchange of information among to allow efficient exchange of information among databases without loss of structural informationdatabases without loss of structural information

» to guide database managers in the creation of a database to guide database managers in the creation of a database structure (records divided in fields and subfields)structure (records divided in fields and subfields)

• Examples: (MARC = machine readable catalogue)Examples: (MARC = machine readable catalogue)

» LC-MARC (=Library of Congress MARC); UNIMARCLC-MARC (=Library of Congress MARC); UNIMARC

» Common Communication Format (of UNESCO)Common Communication Format (of UNESCO)

» SGMLSGML

***

Page 92: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Common Communication Format Common Communication Format (CCF): description(CCF): description

Common Communication Format Common Communication Format (CCF): description(CCF): description

• Developed by the Developed by the Unesco - General Information Programme Unesco - General Information Programme for international applicationfor international application

• Includes a system of numeric tags indicating Includes a system of numeric tags indicating

» the location of fields and subfields in the recordsthe location of fields and subfields in the records

» the meaning of the fields and subfieldsthe meaning of the fields and subfields

**-

Page 93: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Common Communication Format Common Communication Format (CCF): availability(CCF): availability

Common Communication Format Common Communication Format (CCF): availability(CCF): availability

Published and made available free of charge by the Published and made available free of charge by the Unesco - General Information ProgrammeUnesco - General Information Programme

» Printed manualsPrinted manuals

» Printed implementation notesPrinted implementation notes

» Example CDS/ISIS database structured according to the Example CDS/ISIS database structured according to the Common Communication FormatCommon Communication Format

**-

Page 94: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Exchange of data among systems: Exchange of data among systems: requirementsrequirements

Exchange of data among systems: Exchange of data among systems: requirementsrequirements

• Subject thesaurus (relation-structure + contents)Subject thesaurus (relation-structure + contents)

• Subject classification scheme + level of usageSubject classification scheme + level of usage

• Contents of fields (and subfields) in the records (in the Contents of fields (and subfields) in the records (in the case of bibliographic databases: cataloguing input rules)case of bibliographic databases: cataloguing input rules)

• Database structure: records, fields, subfields,... Database structure: records, fields, subfields,... as seen by the database manageras seen by the database manager

• Version of the program for database managementVersion of the program for database management

• Type of program for database managementType of program for database management

• Alphabet used for the dataAlphabet used for the data

**-

Page 95: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,

Compatibility among databases: Compatibility among databases: an examplean example

Compatibility among databases: Compatibility among databases: an examplean example

• Library of Congress Subject Headings (LCSH) Library of Congress Subject Headings (LCSH) (a thesaurus)(a thesaurus)

• Universal Decimal Classification (UDC)Universal Decimal Classification (UDC)

• Anglo American Cataloguing Rules (AACR)Anglo American Cataloguing Rules (AACR)

• Common Communication Format (CCF)Common Communication Format (CCF)

• Version 3.0Version 3.0

• CDS/ISIS programCDS/ISIS program

• Extension of ASCII by IBMExtension of ASCII by IBM

ISO standard

for record

storage !

ISO standard

for record

storage !

**-Example

Page 96: Text information storage and retrieval and the CDS/ISIS program *** Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel,