Data integration The STITCH database of protein–small molecule interactions Lars Juhl Jensen
Jun 27, 2015
Data integrationThe STITCH database of protein–small molecule interactions
Lars Juhl Jensen
Kuhn et al., Nucleic Acids Research, 2010
functional associations
protein–small molecule
protein–protein
parts lists
>2.5 million proteins
630 genomes
many databases
different formats
model organism databases
Ensembl
RefSeq
PubChem compounds
>74,000 small molecules
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
high confidence
many databases
MIPSMunich Information center
for Protein Sequences
Gene Ontology
KEGGKyoto Encyclopedia of Genes and Genomes
MetaCyc
PIDNCI-Nature Pathway Interaction Database
Reactome
different formats
different identifiers
partially redundant
interaction data
protein–small molecule
in vitro binding assays
protein–protein
yeast two-hybrid
affinity purification
fragment complementation
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
gene coexpression
many databases
BindingDB
CTDComparative Toxicogenomics Database
DrugBank
GLIDAGPCR-Ligand Database
PDSP KiPsycoactive Drug Screening Program
PharmGKBPharmacogenomics Knowledge Base
BINDBiomolecular Interaction Network Database
BioGRIDGeneral Repository for Interaction Datasets
DIPDatabase of Interacting Proteins
IntAct
MINTMolecular Interactions Database
HPRDHuman Protein Reference Database
PDBProtein Data Bank
GEOGene Expression Omnibus
different formats
different identifiers
partially redundant
literature mining
>10 km
human readable
not computer readable
different names
text corpus
MEDLINE
SGDSaccharomyces Genome Database
The Interactive Fly
OMIMOnline Mendelian Inheritance in Man
dictionary
co-mentioning
NLPNatural Language Processing
restricted access
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
conserved neighborhood
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
integration
many data types
not comparable
variable quality
spread over 630 genomes
quality scores
reproducibility
von Mering et al., Nucleic Acids Research, 2005
intergenic distances
Korbel et al., Nature Biotechnology, 2004
benchmarking
calibrate vs. gold standard
von Mering et al., Nucleic Acids Research, 2005
raw quality scores
probabilistic scores
orthology transfer
von Mering et al., Nucleic Acids Research, 2005
combine all evidence
Acknowledgments
Michael Kuhn
Monica Campillos
Christian von Mering
Manuel Stark
Samuel Chaffron
Philippe Julien
Tobias Doerks
Jan Korbel
Berend Snel
Martijn Huynen
Peer Bork
Predicting novel targets for existing drugs using side effect information
Lars Juhl Jensen
the problem
new uses for old drugs
drug–drug network
shared target(s)
chemical similarity
Campillos & Kuhn et al., Science, 2008
Campillos & Kuhn et al., Science, 2008
similar drugs share targets
only trivial predictions
the idea
chemical perturbations
phenotypic readouts
drug treatment
side effects
the implementation
information on side effects
package inserts
Campillos & Kuhn et al., Science, 2008
text mining
side-effect ontology
backtracking
Campillos & Kuhn et al., Science, 2008
side-effect correlations
Campillos & Kuhn et al., Science, 2008
GSC weighting
side-effect frequencies
Campillos & Kuhn et al., Science, 2008
raw similarity score
Campillos & Kuhn et al., Science, 2008
p-values
Campillos & Kuhn et al., Science, 2008
side-effect similarity
chemical similarity
Campillos & Kuhn et al., Science, 2008
reference set
drug–target pairs
Campillos & Kuhn et al., Science, 2008
drug–drug pairs
score bins
benchmark
Campillos & Kuhn et al., Science, 2008
fit calibration function
Campillos & Kuhn et al., Science, 2008
probabilistic scores
the results
drug–drug network
ATC codes
Campillos & Kuhn et al., Science, 2008
categorization
Campillos & Kuhn et al., Science, 2008
Campillos & Kuhn et al., Science, 2008
Campillos & Kuhn et al., Science, 2008
map onto score space
Campillos & Kuhn et al., Science, 2008
the experiments
20 drug–drug relations
in vitro binding assays
Campillos & Kuhn et al., Science, 2008
Campillos & Kuhn et al., Science, 2008
Campillos & Kuhn et al., Science, 2008
Ki<10 µM for 11 of 20
cell assays
Campillos & Kuhn et al., Science, 2008
9 of 9 showed activity
the future
SIDER
integration with STITCH
Acknowledgments
Monica Campillos
Michael Kuhn
Anne-Claude Gavin
Peer Bork
larsjuhljensen