Lars Juhl Jensen Systems biology Bioinformatics on complete biological systems
May 10, 2015
Lars Juhl Jensen
Systems biologyBioinformatics on complete biological
systems
can a biologist fix a radio?
Lazebnik, Biochemistry, 2004
one gene
one postdoc
knockout phenotype
name the gene
Lazebnik, Biochemistry, 2004
all aspects
one gene
high-throughput biology
one technology
one lab
all genes
one aspect
systems biology
complete systems
all aspects
all genes
systems-level properties
two subfields
mathematical modeling
small systems
data integration
large systems
mathematical modeling
small systems
Chen, Mol. Biol. Cell, 2004
many equations
Chen, Mol. Biol. Cell, 2004
simulation
Chen, Mol. Biol. Cell, 2004
many parameters
Chen, Mol. Biol. Cell, 2004
requires detailed knowledge
data integration
association networks
guilt by association
STRING
~2.6 million proteins
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
a real example
Cell
Cellulosomes
Cellulose
experimental data
gene coexpression
protein interactions
Jensen & Bork, Science, 2008
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDK1
CDC2
flexible matching
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
orthographic variation
CDC2
hCdc2
“black list”
SDS
information extraction
count co-mentioning
within documents
within paragraphs
within sentences
scoring scheme
corpora
~22 million abstracts
no access
~4 million full-text articles
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010
localization and disease
small molecules
proteins
compartments
tissues
diseases
organisms
environments
suite of web resources
common backend database
jensenlab.org
text mining
curated knowledge
experimental data
computational predictions
quality scores
web-centric databases
DISEASES
visualization
COMPARTMENTS
compartments.jensenlab.org
TISSUES
tissues.jensenlab.org
project onto networks
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
compartments.jensenlab.org
tissues.jensenlab.org
diseases.jensenlab.org
summary
bioinformatics
more than alignment
data/text mining
save you much time
Acknowledgments
Protein networks
Christian von MeringDamian Szklarczyk
Michael KuhnManuel Stark
Samuel ChaffronChris Creevey
Jean MullerTobias DoerksPhilippe Julien
Alexander RothMilan Simonovic
Jan KorbelBerend Snel
Martijn HuynenPeer Bork
Literature miningSune FrankildEvangelos PafilisJanos BinderKalliopi TsafouAlberto SantosHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’Donoghue