November 2009 INIS Training Seminar 1 International Atomic Energy Agency INIS Training Seminar INIS Training Seminar Subject Analysis, Thesaurus und Subject Analysis, Thesaurus und Computer Assisted Indexing Computer Assisted Indexing 23 – 27 November 2009 Vienna, Austria Alexander Nevyjel Head, Content Management Group
29
Embed
INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing
INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing. Alexander Nevyjel Head, Content Management Group. 23 – 27 November 2009 Vienna, Austria. Introduction to Subject Analysis. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
November 2009INIS Training Seminar 2 International Atomic Energy Agency
Introduction to Subject AnalysisIntroduction to Subject Analysis
• Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules)
• Steps of Subject Analysis• subject classification
• abstracting
• subject indexing
November 2009INIS Training Seminar 3 International Atomic Energy Agency
Subject ClassificationSubject Classification
• The main topic of the document determines the primary subject category
• If there are other significant topics, one or more secondary subject categories can be assigned in addition
November 2009INIS Training Seminar 4 International Atomic Energy Agency
AbstractingAbstracting
• Each input item should contain an English abstract(exception: short communications)
• Abstracts in other languages are optional
• If an author abstract is available, it should be checked by the subject specialist, and edited, if necessary
• An abstract should be as informative as possible
• Emphasize what is novel about the information in the original document
November 2009INIS Training Seminar 5 International Atomic Energy Agency
ThesaurusThesaurus
„A thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge“
This definition has been adopted by UNESCO„Guidelines for the establishment and development of monolingual
thesauri“, UNESCO, SC/W/255, Paris, September 1973
November 2009INIS Training Seminar 6 International Atomic Energy Agency
The Thesaurus and its StructureThe Thesaurus and its Structure
Relationship Sy Cross reference
hierarchical BT broader term (level 1, 2,...)hierarchical NT narrower term (level 1, 2,...)
affinitive RT related term
preferential UF used for (reciprocally USE ...)
preferential UF+ used for multiple(reciprocally USE ... AND ...)
preferential SF seen for(reciprocally SEE ... OR ...)
November 2009INIS Training Seminar 7 International Atomic Energy Agency
Subject IndexingSubject Indexing
Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the Thesaurus
• Understanding of the content --> subject specialist
• Familiarity with Thesaurus and indexing rules
• Select a set of descriptors that describes the subject content of the piece of literature
November 2009INIS Training Seminar 8 International Atomic Energy Agency
Procedures for IndexingProcedures for Indexing
• Carefully read the title and abstract and scan the body of the piece of literature
• scan the full text (introduction, table of content, tables,
graphs, figures, conclusion) to find information items missing from the abstract or requiring more precision
• Identify the concept(s) about which the piece of literature contains useful information
• Translate the concepts into descriptors
• Avoid overindexing
November 2009INIS Training Seminar 9 International Atomic Energy Agency
If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following:
• Proposed term
• Proposed word block of the term (in particular proposed BTs)
• Potential forbidden terms pointing to this proposed descriptor
• Scope note when appropriate
• Explanation and justification for the proposal
• One or more sample records
November 2009 INIS Training Seminar 10
International Atomic Energy Agency
The purpose of subject indexing isThe purpose of subject indexing is
to enable useful retrievalto enable useful retrieval
November 2009INIS Training Seminar 11 International Atomic Energy Agency
Computer-assisted Indexing - CAIComputer-assisted Indexing - CAI
• Kick-off Meeting Jan 2004
• Implementation and Customisation Jun 2004
• Production Indexing from Jun 2004 ongoing
• CAI version 1.0 final acceptance Aug 2004
• Tuning of the system from Aug 2004 ongoing
• CAI batch processing for Member States Dec 2004
• CAI online from remote for MS Nov 2007
November 2009INIS Training Seminar 12 International Atomic Energy Agency
CAI Thesaurus extensionCAI Thesaurus extension
“Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more
USE relations• CAI internal only • not exported to INIS production system• not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text
November 2009INIS Training Seminar 13 International Atomic Energy Agency
Hidden Terms: CompoundsHidden Terms: Compounds
Descriptor hidden term free text
MAGNESIUM BORIDES MgB_2 MgB2
MAGNESIUM CARBONATES MgCO_3 MgCO3
MAGNESIUM HYDRIDES MgH_2 MgH2
IRON BROMIDES iron dibromideIRON BROMIDES iron tribromideARSENIC IONS As"3"- As3-
ACETYLENE C_2H_2 C2H2
ACETALDEHYDE C_2H_4O C2H4O
ACETIC ACID C_2H_4O_2 C2H4O2
approx. 1400 hidden terms (expected 3000)
November 2009INIS Training Seminar 14 International Atomic Energy Agency