Top Banner
Information Retrieval - II Information retrieval (IR) is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, statistics and physics.
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bioinformatioc: Information Retrieval - II

Information Retrieval - II

Information retrieval (IR) is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web.

IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, statistics and physics.

Page 2: Bioinformatioc: Information Retrieval - II

Information Storage and Retrieval (ISAR):

Operations performed by the hardware and software used in indexing and storing a file of machine-readable records whenever a user queries the system for information relevant to a specific topic. For records to be retrieved, the search statement must be expressed in syntax executable by the computer.

Page 3: Bioinformatioc: Information Retrieval - II

Information Storage and Retrieval (ISAR):Information Storage and Retrieval (ISAR):

A computer hardware and software system designed to accept, store, manipulate, and analyze data and to report results, usually on a regular, ongoing basis. An IS usually consists of a data input subsystem, a data storage and retrieval subsystem, a data analysis and manipulation subsystem, and a reporting subsystem.

Page 4: Bioinformatioc: Information Retrieval - II

Information Storage and Retrieval (ISAR):

Widely used in scientific research, business management, medicine and health, resource management, and other fields that require statistical reporting.

Page 5: Bioinformatioc: Information Retrieval - II

Information Retrieval Process:

An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy

Page 6: Bioinformatioc: Information Retrieval - II

Information Retrieval Process:

An object is an entity which keeps or stores information in a database. User queries are matched to objects stored in the database. Depending on the application the data objects may be, for example, text documents, images or videos. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.

Page 7: Bioinformatioc: Information Retrieval - II

Information Retrieval Process:

Most IR systems compute a numeric score on how well each object in the database match the query, and rank the objects according to this value. The top ranking objects are then shown to the user. The process may then be iterated if the user wishes to refine the query.

Page 8: Bioinformatioc: Information Retrieval - II

Performance measures:Performance measures:

Many different measures for evaluating the performance of information retrieval systems have been proposed. The measures require a collection of documents and a query. All common measures described here assume a ground truth notion of relevancy: every document is known to be either relevant or non-relevant to a particular query. In practice queries may be ill-posed and there may be different shades of relevancy. Precision and Recall are two widely used measures for evaluating the quality of results in domains such as Information Retrieval and statistical classification.

Page 9: Bioinformatioc: Information Retrieval - II

Performance measures:Performance measures:

Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness.

In an Information Retrieval scenario, Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search, and Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents (which should have been retrieved).

Page 10: Bioinformatioc: Information Retrieval - II

Performance measures

Precision

Precision is the fraction of the documents retrieved that are relevant to the user's information need.

Page 11: Bioinformatioc: Information Retrieval - II

Performance measures

Recall Recall is the fraction of the documents that are relevant to

the query that are successfully retrieved.

Page 12: Bioinformatioc: Information Retrieval - II

Recall = A/(A+D)

Proportion of documents relevant to a search questionthat are retrieved by a given search formulation.

Precision = A/(A+B)

Proportion of documents retrieved by a given search formulation that is relevant to the search question.

Page 13: Bioinformatioc: Information Retrieval - II

General applications of information retrieval * Digital libraries * Information filtering * Media search o Blog search o Image retrieval o Music retrieval o News search o Speech retrieval o Video retrieval * Search engines o Desktop search o Enterprise search o Federated search o Mobile search o Social search o Web search

Page 14: Bioinformatioc: Information Retrieval - II

Domain specific applications of information retrieval

* Expert search finding * Genomic information retrieval * Geographic information retrieval * Information retrieval for chemical structures * Information retrieval in software engineering * Legal information retrieval * Vertical search

Page 15: Bioinformatioc: Information Retrieval - II

District Health Information System (DHIS)

The District Health Information System (DHIS) is a highly flexible, open-source health management information system and data warehouse. It is developed by the Health Information Systems Programme (HISP) project.

Page 16: Bioinformatioc: Information Retrieval - II

District Health Information System (DHIS)

The solution covers aggregated routine data, semi-permanent data (staffing, equipment, infrastructure, population estimates), survey/audit data, and certain types of case-based on patient-based data (for instance disease notification or patient satisfaction surveys). The system supports the capture of data linked to any level in an organizational hierarchy, any data collection frequency, a high degree of customization at both the input and output side. It has been translated into a number of languages.

Page 17: Bioinformatioc: Information Retrieval - II
Page 18: Bioinformatioc: Information Retrieval - II
Page 19: Bioinformatioc: Information Retrieval - II
Page 20: Bioinformatioc: Information Retrieval - II

Health Information Systems Program (HISP)

Page 21: Bioinformatioc: Information Retrieval - II
Page 22: Bioinformatioc: Information Retrieval - II
Page 23: Bioinformatioc: Information Retrieval - II
Page 24: Bioinformatioc: Information Retrieval - II
Page 25: Bioinformatioc: Information Retrieval - II

Procticals

EHR (Google Health and MS HealthVault)MRS (OpenMRS)HIS (DISH v 2.0)CPOEBioinformatics Portal (PU)

Page 26: Bioinformatioc: Information Retrieval - II
Page 27: Bioinformatioc: Information Retrieval - II
Page 28: Bioinformatioc: Information Retrieval - II
Page 29: Bioinformatioc: Information Retrieval - II
Page 30: Bioinformatioc: Information Retrieval - II
Page 31: Bioinformatioc: Information Retrieval - II
Page 32: Bioinformatioc: Information Retrieval - II
Page 33: Bioinformatioc: Information Retrieval - II
Page 34: Bioinformatioc: Information Retrieval - II

Thank you...