Top Banner
Integration of biomedical literature and databases Lars Juhl Jensen EMBL Heidelberg
106

Integration of biomedical literature and databases

May 10, 2015

Download

Technology

Nordic Conference for Scolarly Communication 2008, Scandic Star Hotel, Lund, Sweden, April 21-23, 2008
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integration of biomedical literature and databases

Integration of biomedical literature and databases

Lars Juhl JensenEMBL Heidelberg

Page 2: Integration of biomedical literature and databases

why integration?

Page 3: Integration of biomedical literature and databases

why biomedicine?

Page 4: Integration of biomedical literature and databases

why literature?

Page 5: Integration of biomedical literature and databases

why databases?

Page 6: Integration of biomedical literature and databases

open access databases

Page 7: Integration of biomedical literature and databases

a lot of them

Page 8: Integration of biomedical literature and databases

Duncan Hull, nodalpoint.org

Page 9: Integration of biomedical literature and databases

PubChem

Page 10: Integration of biomedical literature and databases
Page 11: Integration of biomedical literature and databases

19.2 million compounds

Page 12: Integration of biomedical literature and databases

GenBank

Page 13: Integration of biomedical literature and databases
Page 14: Integration of biomedical literature and databases

85 million sequences

Page 15: Integration of biomedical literature and databases

89 billion nucleotides

Page 16: Integration of biomedical literature and databases

UniProt

Page 17: Integration of biomedical literature and databases
Page 18: Integration of biomedical literature and databases

5.6 million sequences

Page 19: Integration of biomedical literature and databases

PDB

Page 20: Integration of biomedical literature and databases
Page 21: Integration of biomedical literature and databases

50000 protein structures

Page 22: Integration of biomedical literature and databases

BINDBiomolecular Interaction Network Database

Page 23: Integration of biomedical literature and databases

DIPDatabase of Interacting Proteins

Page 24: Integration of biomedical literature and databases

MINTMolecular Interactions Database

Page 25: Integration of biomedical literature and databases

IntAct

Page 26: Integration of biomedical literature and databases

BioGRID

Page 27: Integration of biomedical literature and databases
Page 28: Integration of biomedical literature and databases

204000 interactions

Page 29: Integration of biomedical literature and databases

too many

Page 30: Integration of biomedical literature and databases

incomplete

Page 31: Integration of biomedical literature and databases

literature mining

Page 32: Integration of biomedical literature and databases

MEDLINE

Page 33: Integration of biomedical literature and databases

17.9 million citations

Page 34: Integration of biomedical literature and databases
Page 35: Integration of biomedical literature and databases

too much to read

Page 36: Integration of biomedical literature and databases

information retrieval

Page 37: Integration of biomedical literature and databases

finding the papers

Page 38: Integration of biomedical literature and databases
Page 39: Integration of biomedical literature and databases
Page 40: Integration of biomedical literature and databases

user-specified query

Page 41: Integration of biomedical literature and databases

“yeast AND cell cycle”

Page 42: Integration of biomedical literature and databases

stemming

Page 43: Integration of biomedical literature and databases

yeast / yeasts

Page 44: Integration of biomedical literature and databases

dynamic query expansion

Page 45: Integration of biomedical literature and databases

yeast / S. cerevisiae

Page 46: Integration of biomedical literature and databases

ranking

Page 47: Integration of biomedical literature and databases
Page 48: Integration of biomedical literature and databases
Page 49: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 50: Integration of biomedical literature and databases

no tool will find it

Page 51: Integration of biomedical literature and databases

entity recognition

Page 52: Integration of biomedical literature and databases

identifying the substance(s)

Page 53: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 54: Integration of biomedical literature and databases

Cdc28 yeast

Page 55: Integration of biomedical literature and databases

Cdc28 cell cycle

Page 56: Integration of biomedical literature and databases
Page 57: Integration of biomedical literature and databases

synonyms list

Page 58: Integration of biomedical literature and databases

orthographic variation

Page 59: Integration of biomedical literature and databases

CDC28

Page 60: Integration of biomedical literature and databases

Cdc28p

Page 61: Integration of biomedical literature and databases

disambiguation

Page 62: Integration of biomedical literature and databases

Cdc2

Page 63: Integration of biomedical literature and databases

SDS

Page 64: Integration of biomedical literature and databases
Page 65: Integration of biomedical literature and databases

still too much to read

Page 66: Integration of biomedical literature and databases

information extraction

Page 67: Integration of biomedical literature and databases

formalizing the facts

Page 68: Integration of biomedical literature and databases
Page 69: Integration of biomedical literature and databases

co-mentioning

Page 70: Integration of biomedical literature and databases

statistical methods

Page 71: Integration of biomedical literature and databases

NLPNatural Language Processing

Page 72: Integration of biomedical literature and databases

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 73: Integration of biomedical literature and databases

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 74: Integration of biomedical literature and databases
Page 75: Integration of biomedical literature and databases

yet another database

Page 76: Integration of biomedical literature and databases

integration

Page 77: Integration of biomedical literature and databases

augmented browsing

Page 78: Integration of biomedical literature and databases
Page 79: Integration of biomedical literature and databases

semantic tagging

Page 80: Integration of biomedical literature and databases
Page 81: Integration of biomedical literature and databases

association networks

Page 82: Integration of biomedical literature and databases
Page 83: Integration of biomedical literature and databases
Page 84: Integration of biomedical literature and databases

curated knowledge

Page 85: Integration of biomedical literature and databases
Page 86: Integration of biomedical literature and databases

genomic context

Page 87: Integration of biomedical literature and databases

phylogenetic profiles

Page 88: Integration of biomedical literature and databases
Page 89: Integration of biomedical literature and databases

gene neighborhood

Page 90: Integration of biomedical literature and databases
Page 91: Integration of biomedical literature and databases

experimental data

Page 92: Integration of biomedical literature and databases

physical interactions

Page 93: Integration of biomedical literature and databases
Page 94: Integration of biomedical literature and databases

genetic interactions

Page 95: Integration of biomedical literature and databases
Page 96: Integration of biomedical literature and databases

literature mining

Page 97: Integration of biomedical literature and databases
Page 98: Integration of biomedical literature and databases

restricted access

Page 99: Integration of biomedical literature and databases

Bayesian framework

Page 100: Integration of biomedical literature and databases
Page 101: Integration of biomedical literature and databases

summary

Page 102: Integration of biomedical literature and databases

literature mining is good

Page 103: Integration of biomedical literature and databases

data integration is better

Page 104: Integration of biomedical literature and databases

open access

Page 105: Integration of biomedical literature and databases

Acknowledgments

STRING & STITCH– Christian von Mering

– Michael Kuhn

– Manuel Stark

– Samuel Chaffron

– Philippe Julien

– Tobias Doerks

– Jan Korbel

– Berend Snel

– Martijn Huynen

– Peer Bork

Reflect– Evangelos Pafilis

– Michael Kuhn

– Sean O’Donoghue

– Reinhardt Schneider

Natural Language Processing– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas

– Peer Bork

Page 106: Integration of biomedical literature and databases