Lars Juhl Jensen Systems biology Bioinformatics on complete biological systems
May 10, 2015
Lars Juhl Jensen
Systems biologyBioinformatics on complete biological
systems
can a biologist fix a radio?
Lazebnik, Biochemistry, 2004
one gene
one postdoc
knockout phenotype
name the gene
Lazebnik, Biochemistry, 2004
all aspects
one gene
high-throughput biology
one technology
one lab
all genes
one aspect
systems biology
complete systems
all aspects
all genes
systems-level properties
two subfields
mathematical modeling
small systems
data integration
large systems
mathematical modeling
small systems
Chen, Mol. Biol. Cell, 2004
many equations
Chen, Mol. Biol. Cell, 2004
simulation
Chen, Mol. Biol. Cell, 2004
many parameters
Chen, Mol. Biol. Cell, 2004
requires detailed knowledge
network biology
association networks
guilt by association
protein networks
STRING
>1100 organisms
~2.6 million proteins
Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011
Exercise 1Go to http://string-db.org
Query for CDC28 in budding yeast
Try different evidence views
Show only high-confidence links
Show only experimental evidence
genomic context
gene fusion
Korbel et al., Nature Biotechnology, 2004
operons
Korbel et al., Nature Biotechnology, 2004
bidirectional promoters
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
a real example
Cell
Cellulosomes
Cellulose
experimental data
gene coexpression
protein interactions
Jensen & Bork, Science, 2008
genetic interactions
Beyer et al., Nature Reviews Genetics, 2007
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
chemical networks
STITCH
STRING + 300k chemicals
drugs
metabolites
known drug targets
high-throughput assays
metabolic pathways
Exercise 2Go to http://stitch-db.org
Query for TYMS in human
What is the role of thymidylate?
What is the role of dUMP
What is the role of Permetrexed
many databases
different formats
different identifiers
variable quality
not comparable
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDK1
CDC2
flexible matching
spaces and hyphens
cyclin dependent kinase 1
cyclin-dependent kinase 1
orthographic variation
CDC2
hCdc2
“black list”
SDS
information extraction
count co-mentioning
within documents
within paragraphs
within sentences
scoring scheme
proteins
small molecules
compartments
tissues
phenotypes
diseases
adverse drug reactions
organisms
environments
Exercise 3Go to http://diseases.jensenlab.org
Find TYMS disease associations
Inspect the text-mining evidence
Find genes linked to colorectal cancer
Explore the gene network
text corpus
~22 million abstracts
no access
~4 million full-text articles
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010
localization and disease
suite of web resources
common backend database
curated knowledge
experimental data
text mining
computational predictions
unified identifiers
quality scores
visualization
COMPARTMENTS
compartments.jensenlab.org
TISSUES
tissues.jensenlab.org
more to come
summary
bioinformatics
more than alignment
data/text mining
save you much time
questions?