Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist
Jan 20, 2016
Basics ofInformation Retrieval and
Query Formulation
Bekele Negeri DuresaNuclear Information Specialist
Outline
Information Retrieval The INIS Collection Planning searches Formulating queries Measure of Search effectiveness Using search results and queries
2
Information Retrieval
Information Retrieval (IR) is finding material (usually documents) that satisfies an information need from within large collections (usually stored on computers).
Today we frequently think first of “web search”, but includes:
E-mail search Searching your laptop Corporate knowledge bases Structured bibliographic databases
3
The INIS Collection
INIS Collection is comprised of
The INIS bibliographic Database (3,842,516 records) The INIS Nonconventional Literature (NCL) collection
(510,458 total and 369,140 publicly available) A bibliographic database consists of data (records) whose
attribute are described in fields and a means by which to search these fields, a search engine.
Databases may look different on screen but the underlying principles for searching and formulating search strategies are common to all.
4
Bibliographic Databases (Types)
Bibliographic Databases: provide only citation or reference author(s), title, subject(s) and publisher..) and with this information you should be able to locate the item in the Library.
Bibliographic with some full text content: these enhanced databases include keywords and abstracts and often, but not always, include the full text of a set of records.
Bibliographic databases with full text content: these databases include the entire full text for all articles and other documents indexed.
5
Planning your search
“Understanding the Problem is Half of the Solution”
Define precisely the information you are seeking
Identify the concepts that represent the problem
6
Planning a search: example
Risk of medical radiation exposure? To whom: to patients or doctors and medical technicians? If to patients: are you concerned about exposure due to
radiodiagnostics (CT, X-ray..) or due to radiation therapy or both?
If radiotherapy: are you concerned about radionuclide therapy or external irradiation therapy?
If about personnel: are you interested about safety policy of medical establishements?
7
Planning your search (Cont’d)
Topic: Risk of radiation exposure of medical staff in a radiotherapy department
Concepts: Exposure to radiation Medical staff Radiotherapy
Are we looking for documents In certain language, from certain country.. Latest publications Only records with full text documents, or journal citations..?
8
IINIS Database Fields
Numerical (Exact or Range search) Year of publication (PY) Reference Number (RN)
Free Text Title (TI) Authors (AU) Source (SO) Abstract (AB)
Controlled Vocabulary Language (LA) Country of Input (CO) Descriptors (DE)
Indexer-assigned descriptors (DEI) Computer-upposted descriptors (DEC)
9
Search Strategy Simple search
single search term or phrase “Oncology” , “nuclear safety”
Advanced search (combining concepts) Boolean Operators: OR, AND, NOT Text Operators any (includes any), all (includes all), exact phrase Numeric Operators
equal, more, less, more or equal, less or equal Truncation, Wildcard Multilingual Search
10
Query Syntax Google Search Appliance
11
Query Formulation
Translating your search concepts into proper search syntax
For the Topic: Risk of radiation exposure of medical staff in a radiotherapy department Simple to complex MEDICAL PERSONNEL (is a BT for RADIOLOGICAL PERSONNEL) OCCUPATIONAL EXPOSURE or OCCUPATIONAL SAFETY or RADIATION PROTECTION RADIOTHERAPY Try to search for individual terms and explore the database; you may identify other key concepts like
radiation doses, ALARA, dose limits… Then combine them using boolean operators (and, or, not) Some databases allow you to combine searches while others allow you to combine your results
during selection of records
12
Measuring Search Effectiveness
Precision & Recall
Recall: the ratio of the number of relevant records retrieved to the total number of relevant records in the database.
13
Precision: the ratio of the number of relevant records retrieved to the totalnumber of irrelevant and relevant records retrieved.
Precision and recall are Inversely relatedHigh recall = comprehensive retrieval
but high noiseHigh Precision = only relevant records
but miss out good records
Source: http://www.creighton.edu/fileadmin/user/HSL/docs/ref/Searching_-_Recall_Precision.pdf
14
Optimize your search strategy
Precision Search in particular field Search in DEI (indexer assigned descriptor) Use exact Phrase Combine using “AND”
Recall Search across fields Combine synonyms, related terms, broad or general terms Use “any” or “all” words
Optimise Use your best judgment From simple to complex
15
Using your Query and search results
Selecting relevant records select format (pdf/ html/excel..) Printing/saving Email search results
Storing query Save and run query Subscribe Feeds
16
Thank you!
17