Top Banner
Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin- Milwaukee
34

Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Evidence-Based Information Retrieval in Bioinformatics

Timothy B. Patrick, PhD Healthcare Administration and Informatics,

University of Wisconsin-Milwaukee

Page 2: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Goal of the Project

• The overall, long term goal of this research project is to contribute to evidence-based information retrieval in post-genomic medicine– proof of the effectiveness of the way particular

information resources are used and combined in order to retrieve that information

Page 3: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Aims

• Specific Aim 1: Determine existing pitfalls in accessing literature on gene function

• Specific Aim 2: Based on user warrant, determine the current state of evidence-based functional genomic retrieval

• Specific Aim 3: Based on literary warrant, determine the current state of evidence-based functional genomic retrieval

Page 4: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

“Determine existing pitfalls in accessing literature on gene function”

• That is the topic of my talk later today.

• “Asymmetries in Retrieval of Gene Function Information”

Page 5: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

The Study

• Investigated an example of different paths to the literature that might look to a user to be equivalent but which are not equivalent due to various features of the resources involved.

• Knowledge that they are not equivalent requires knowledge of metadata about the resources.

Page 6: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

number

Three Paths

GenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

Page 7: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

http://www.affymetrix.com/corporate/media/genechip_essentials/gene_expression/Features_and_probes.affx

Page 8: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

number

Three Paths

GenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

Page 9: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• We first collected representative DNA Accession numbers associated with genes expressed in a microarray experiment designed to identify changes in gene expression associated with skeletal muscle recovery from immobilization-induced sarcopenia.

Page 10: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• Next, we retrieved the Unique Identifiers (UI’s) of Entrez Pubmed citations that were associated with the Accession numbers by each of the three Entrez resources. – Directly in the case of Entrez Pubmed– Indirectly, via Pubmed links in the case of Entrez

Nucleotide and Entrez Gene

• Next, we compared the number of Pubmed ID's retrieved by the three resources for each of the Accession numbers.

Page 11: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Summary of Pubmed ID’s by Accession Number

# of

Pubmed ID’s

# of Accessionnumbers

0 198

1 36

2 10

3 4

4 1

5 2

Total 251

# of

Pubmed ID’s

# ofAccessionnumbers

0 132

1 112

2 5

3 2

4 0

5 0

Total 251

Pubmed Nucleotide

# of

Pubmed ID’s

# ofAccession numbers

0 216

1 34

2 0

3 1

4 0

5 0

Total 251

Gene

Page 12: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• Compared number of Pubmed ID’s produced for each Accession number by each path.

• Applied non-parametric test: Kendall’s W– Pubmed versus Nucleotide versus Gene– p < .05

Page 13: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

The Three Paths Are Not Equivalent

≠ ≠Pubmed links

GenbankAccession

number

Pubmed links

GenbankAccession

numberGenbankAccession

number

Pubmed ID Pubmed ID Pubmed ID

Affymetrix Affymetrix Affymetrix

Pubmed Pubmed Pubmed

Nucleotide Gene

Page 14: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

The SI field identifies secondary source databanks and accession numbers of outside resources discussed in MEDLINE articles. The field is composed of the source followed by a slash followed by an accession number and can be searched with one or both components, e.g., genbank [si], AF001892 [si], genbank/AF001892 [si].

The SI field and the Entrez sequence database links are not linked. The PubMed links to these databases are created from the reference field of the GenBank or GenPept flat file. These references include citations that discuss the specific sequence presented in these flat files.

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helppubmed.box.pubmedhelp.Box_1_Search_Field_D#pubmedhelp.Secondary_Source_ID_

Page 15: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

“Based on user warrant, determine the current state of evidence-based functional

genomic retrieval”

• Interviews with biologists who use microarrays to study gene expression levels

• Questions concern what methods for IR are used, why they consider the methods effective, what are criteria of success and failure, and how they see the role of biomedical librarians in the process

Page 16: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Interviews in Progress

• Five interviews currently scheduled at the University of Missouri-Columbia

• Interviews being scheduled at University of Wisconsin-Milwaukee

• In March we interviewed two subjects at NIG in Japan

Page 17: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

“Based on literary warrant, determine the current state of evidence-based functional

genomic retrieval”

• We wanted to investigate how and to what extent biological science researchers reported their information retrieval methods, including details of why they used the methods they did.

Page 18: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• We searched OVID Medline on October 1, 2004 for the period 1966 to September Week 4 2004 with the query “Oligonucleotide Array Sequence Analysis/”, producing 10746 results.

• We then limited the results to English (10374), excluded “review articles” (9049), and limited to the years 2003 – 2004 (4798). We next ranked journals in the results by number of articles, and selected a population of all of the articles from the 13 top journals (n=1373). We randomly sampled 150 articles from that population.

Page 19: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• If the authors of the paper did report gene function, we wanted to know which information sources and retrieval methods they used, as well as the reasons they had for using them. – Functional Attribution Reported– Sources of Information Reported– Retrieval Strategy Reported– Grounds for Choice of Sources Reported – Grounds for Retrieval Strategy Reported

Page 20: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Methods

• How were details of sources and retrieval methods reported?– Methods or Procedures– Results – Discussion

Page 21: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Results

• Typical evidence for attribution of gene function consists of literature citations.

• When a literature search (e.g. Pubmed search), or a search of other knowledge sources (e.g. NCBI databases), is cited as the source of evidence to support attribution of function, rarely are details of the search reported.

• Reasons for using sources and retrieval methods not reported.

Page 22: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Results

• When information retrieval methods are described in the paper, they are typically mentioned only in the “Results” or “Discussion” sections of the paper, and not in the “Methods” section.

• Wet bench methods are reported in more detail than dry bench methods.

Page 23: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Implications for Information Practice

Page 24: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Implications for Information Practice

• There is a need to embrace a workflow concept

• There is a need to develop standards for documentation in e-science

• There is a need to use multidisciplinary teams to develop workflows

Page 25: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

“There is a need to embrace a workflow concept”

• Call a scenario of the use of a combination of multiple information resources databases and analysis tools a workflow

• Workflows are increasingly important for information retrieval and processing in the Life Sciences

Page 26: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Traditional Science

Computer based In

formation

retrieval and processing

The Digitization of Science or E-science

“There is a need to develop standards for documentation in e-science”

Page 27: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

Life Science Information Retrieval and Processing

Workflows

Page 28: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

documentation

Life Science Information Retrieval and Processing

Workflows

Page 29: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

technologyto facilitate

documentation

documentation

Life Science Information Retrieval and Processing

Workflows

Page 30: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

technologyto facilitate

documentation

editorialpolicydrivers

documentation

Life Science Information Retrieval and Processing

Workflows

Page 31: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

“There is a need to use multidisciplinary teams to develop workflows”

Page 32: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS domainexpert

(scientist)

Page 33: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

domain metadataexpert

(informationspecialist) domain

expert(scientist)

Page 34: Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.

INFORMATION ITEMS

METADATA

KNOWLEDGE-ENABLED WORKFLOWS

TOOLS

domain metadataexpert

(informationspecialist) domain

expert(scientist)