Transcript

Lars Juhl Jensen

Turning big data and text collections into web

resources

three parts

data integration

text mining

interface design

data integration

association networks

guilt by association

STRING

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

computational predictions

gene fusion

Korbel et al., Nature Biotechnology, 2004

experimental data

physical interactions

Jensen & Bork, Science, 2008

curated knowledge

metabolic pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

hard work

quality scores

von Mering et al., Nucleic Acids Research, 2005

calibrate vs. gold standard

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDC2

expansion rules

flexible matching

cyclin dependent kinase 1

cyclin-dependent kinase 1

CDC2

hCdc2

“black list”

SDS

proteins

small molecules

compartments

tissues

diseases

information extraction

count co-mentioning

within documents

within paragraphs

within sentences

corpora

~22 million abstracts

no access

~4 million full-text articles

interface design

ease of use

web resources

simple search interface

complex relational database

attractiveness

data visualization

STRING

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

payload

compartments.jensenlab.org

COMPARTMENTS

compartments.jensenlab.org

TISSUES

tissues.jensenlab.org

provenance

evidence viewers

DISEASES

reusability

web services

download files

open licenses

Acknowledgments

Protein networks

Christian von MeringDamian Szklarczyk

Michael KuhnManuel Stark

Samuel ChaffronChris Creevey

Jean MullerTobias DoerksPhilippe Julien

Alexander RothMilan Simonovic

Jan KorbelBerend Snel

Martijn HuynenPeer Bork

Literature miningSune FrankildEvangelos PafilisJanos BinderKalliopi TsafouAlberto SantosHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’Donoghue

top related