The STRING database Lars Juhl Jensen EMBL Heidelberg
May 10, 2015
The STRING database
Lars Juhl Jensen
EMBL Heidelberg
data integration
Jensen et al., Drug Discovery Today: Targets, 2004
functional interactions
Bork et al., Current Opinion in Structural Biology, 2005
373 proteomes
Genome Reviews
RefSeq
Ensembl
model organism databases
genomic context methods
gene fusion
gene neighborhood
phylogenetic profiles
Cell
Cellulosomes
Cellulose
automation
scoring scheme
correct interactions
wrong associations
gene fusion
sequence similarity
gene neighborhood
sum of intergenic distances
phylogenetic profiles
SVDSingular Value Decomposition
Euclidian distance
raw quality scores
not comparable
sequence similarity
sum of intergenic distances
Euclidian distance
benchmarking
calibrate vs. gold standard
raw quality scores
probabilistic scores
curated knowledge
KEGGKyoto Encyclopedia of Genes and Genomes
Reactome
MIPSMunich Information center
for Protein Sequences
STKESignal Transduction Knowledge Environment
primary experimental data
many sources
many parsers
physical protein interactions
BINDBiomolecular Interaction Network Database
GRIDGeneral Repository for Interaction Datasets
MINTMolecular Interactions Database
DIPDatabase of Interacting Proteins
HPRDHuman Protein Reference Database
merge data by publication
topology-based scores
von Mering et al., Nucleic Acids Research, 2005
co-expression
GEOGene Expression Omnibus
correlation coefficient
literature mining
different gene identifiers
synonyms lists
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
co-mentioning
NLPNatural Language Processing
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxgene The GAL4 gene]
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
calibrate vs. gold standard
combine all evidence
spread over many species
transfer by orthology
von Mering et al., Nucleic Acids Research, 2005
two modes
orthologous groups
von Mering et al., Nucleic Acids Research, 2005
fuzzy orthology
von Mering et al., Nucleic Acids Research, 2005
Bayesian scoring scheme
Bork et al., Current Opinion in Structural Biology, 2005
Acknowledgments
The STRING team (EMBL)– Christian von Mering
– Berend Snel
– Martijn Huynen
– Sean Hooper
– Samuel Chaffron
– Julien Lagarde
– Mathilde Foglierini
– Peer Bork
Literature mining project(EML Research)– Jasmin Saric
– Rossitza Ouzounova
– Isabel Rojas