Top Banner
Applications of Text and Data Mining of biomedical databases Miguel Andrade Computational Biology & Data Mining group Max Delbrück Center for Molecular Medicine [email protected]
32

Applications of Text and Data Mining of biomedical databases

Feb 27, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Applications of Text and Data Mining of biomedical databases

Applications of Text and

Data Mining of biomedical

databases

Miguel Andrade Computational Biology & Data Mining group Max Delbrück Center for Molecular Medicine

[email protected]

Page 2: Applications of Text and Data Mining of biomedical databases

Gene structures

Gene expression

Protein sequences

Protein databases

Literature databases

Biological predictions

Human disease

001001001000100101011110110110

AGCTGGTACGAAGATGTCTCGCA

MLVPIEKAEVPRYILKTEFRKAILTS

In a phosphorylation dependent ma

Page 3: Applications of Text and Data Mining of biomedical databases

Protein and nucleotide sequences (UniProt, Entrez),

Protein domains (PFAM, SMART), Structures (PDB), Diseases (OMIM),

Gene expression (GEO),

Bibliography (records, MEDLINE)

(full text, PubMed Central)

Molecular Biology databases

Page 4: Applications of Text and Data Mining of biomedical databases

Compressed PubMed in XML: 17GB 23M items (exhaustive back to 1966, oldest from 1809) PubMed Central open access subset 26GB of raw XML files (text only), compressed 8GB. 2.6M items

Bibliography (records, MEDLINE)

(full text, PubMed Central)

Molecular Biology databases

Page 5: Applications of Text and Data Mining of biomedical databases

Compressed PubMed in XML: 17GB 23M items (exhaustive back to 1966, oldest from 1809) PubMed Central open access subset 26GB of raw XML files (text only), compressed 8GB. 2.6M items 1 Human Genome 320GB

Bibliography (records, MEDLINE)

(full text, PubMed Central)

Molecular Biology databases

Page 6: Applications of Text and Data Mining of biomedical databases

K2

Mapping systems

K1

ENTRY

Page 7: Applications of Text and Data Mining of biomedical databases

Mapping systems

Trans membrane

PROTEIN

GPCR Receptor

GO

KW

SwissProt

Page 8: Applications of Text and Data Mining of biomedical databases

ENTRY

Mapping systems

K2

K1 ENTRY

Page 9: Applications of Text and Data Mining of biomedical databases

PAPER

Mapping systems

K2

K1 PROTEIN

GO

MeSH SwissProt MEDLINE

Page 10: Applications of Text and Data Mining of biomedical databases

PAPER

Mapping systems

K2

K1 PROTEIN STRUCTURE

SCOP

MeSH PDB MEDLINE

TIM barrel

Enzyme

Page 11: Applications of Text and Data Mining of biomedical databases

PAPER

Iterate!

GENE K2

K1 PROTEIN

KW

MeSH SwissProt MEDLINE

EntrezGene

Page 12: Applications of Text and Data Mining of biomedical databases

MEDLINE

Entrez Gene

UniProt

KW

authors words MeSH

GO

GO

NetAffx

GO UniGene

ProDom

PDB

GO

fold OMIM words

GEO

Page 13: Applications of Text and Data Mining of biomedical databases

PubMed

Page 14: Applications of Text and Data Mining of biomedical databases

PubMed

Page 15: Applications of Text and Data Mining of biomedical databases

PubMed

Page 16: Applications of Text and Data Mining of biomedical databases

Rank MEDLINE according to a topic

Fontaine et al. (2009) Nucleic Acids Research

Jean-Fred Fontaine

http://cbdm.mdc-berlin.de/tools/medlineranker/

MedlineRanker

Page 17: Applications of Text and Data Mining of biomedical databases

Jean-Fred Fontaine

http://cbdm.mdc-berlin.de/tools/medlineranker/

MedlineRanker

Page 18: Applications of Text and Data Mining of biomedical databases

Génie

http://cbdm.mdc-berlin.de/tools/genie/

Ranks a set of genes from a whole genome according to a topic

Fontaine et al. (2011) Nucleic Acids Research

Human

Page 19: Applications of Text and Data Mining of biomedical databases

Génie

http://cbdm.mdc-berlin.de/tools/genie/

Page 20: Applications of Text and Data Mining of biomedical databases

PESCADOR

http://cbdm.mdc-berlin.de/tools/pescador/

Extract interactions and filter by concepts

Barbosa-Silva et al. (2010) BMC Bioinformatics

Adriano Barbosa

Barbosa-Silva et al. (2011) BMC Bioinformatics

Page 21: Applications of Text and Data Mining of biomedical databases

PESCADOR

Page 22: Applications of Text and Data Mining of biomedical databases

Co-occurrences types PESCADOR

co-occurrence in abstract

Type 4

Type 3

Term + Term

Type 2

[Biointeraction] +Term + Term + [Biointeraction]

Type 1

Term + [Biointeraction] + Term

Page 23: Applications of Text and Data Mining of biomedical databases
Page 24: Applications of Text and Data Mining of biomedical databases

Country-specific variations of English

Netzel et al. (2003)

EMBO Reports

Page 25: Applications of Text and Data Mining of biomedical databases

Country-specific variations of English

Netzel et al. (2003)

EMBO Reports

Page 26: Applications of Text and Data Mining of biomedical databases

Worldwide scientific publishing activity

Perez-Iratxeta and Andrade

(2002) Science

Approximate amount of publications for the years 1996–2001 per million

inhabitants by country:

10,000 100

1,000 10 1

Page 27: Applications of Text and Data Mining of biomedical databases

Ratio publications for 1996–2001 / 1989–95

Worldwide scientific publishing activity

+++ -

++ = --

+ ---

Perez-Iratxeta and Andrade

(2002) Science

Page 28: Applications of Text and Data Mining of biomedical databases

Find referees

peer2ref

Andrade-Navarro et al (2012) BioData Mining

Carolina Perez-Iratxeta

(OHRI-Ottawa)

http://www.ogic.ca/peer2ref/

Page 29: Applications of Text and Data Mining of biomedical databases

Andrade-Navarro et al (2012) BioData Mining

Find referees

peer2ref

Carolina Perez-Iratxeta

(OHRI-Ottawa)

http://www.ogic.ca/peer2ref/

Page 30: Applications of Text and Data Mining of biomedical databases

Gareth Palidwor (OHRI-Ottawa)

http://www.ogic.ca/mltrends/

Graph historical term usage in MEDLINE

MLTrends

Palidwor and Andrade-Navarro (2010) Journal of Biomedical Discovery and Collaboration

Page 31: Applications of Text and Data Mining of biomedical databases

Graph historical term usage in MEDLINE

MLTrends

Gareth Palidwor (OHRI-Ottawa)

http://www.ogic.ca/mltrends/

Palidwor and Andrade-Navarro (2010) Journal of Biomedical Discovery and Collaboration

Page 32: Applications of Text and Data Mining of biomedical databases

http://cbdm.mdc-berlin.de/

Enrique Muro

Martin Schaefer

Arvind Mer

David Fournier

Nancy Mah

Marie Gebhardt

Jean-Fred Fontaine

Computational Biology and Data Mining group