SBML (the Systems Biology Markup Language), model databases, andother resources Michael Hucka, Ph.D. Department of Computing + Mathematical Sciences California Institute of Technology Pasadena, CA, USA CCB 2012, August 2012, Cold Spring Harbor Laboratory, NY, USA Email: [email protected]Twitter: @mhucka
93
Embed
SBML (the Systems Biology Markup Language), model databases, and other resources
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SBML (the Systems Biology Markup Language), model databases, and other resources
Michael Hucka, Ph.D.Department of Computing + Mathematical Sciences
California Institute of TechnologyPasadena, CA, USA
CCB 2012, August 2012, Cold Spring Harbor Laboratory, NY, USA
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
Outli
ne
General background and motivations
Brief summary of SBML features
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
Research today: experimentation, computation, cogitation
The many roles of computation in biological researchInstrument/device control, data management, data processing, database applications, statistical analysis, pattern matching, image processing, text mining, chemical structure prediction, genomic sequence analysis, proteomics, other *omics, molecular modeling, molecular dynamics, kinetic simulation, simulated evolution, phylogenetics, ... (to name only a subset)!
Focus here: modeling and simulation
Usually, there are at least two scientific outcomes:
• One or more models (+ associated claims about their behaviors)
• Publication of the results (in some form)
What are the outcomes of modeling and simulation?
Models comein many forms
Models are resultsModels serve as statements of our current understanding of the phenomena being studied*
• A computational model documents your theory in a concrete form
Model can—
• Reduce ambiguity in communication
• Offer a concrete framework for adding new data and theories
• Support direct evaluation of relationships between theories
Bower & Bolouri, Computational modeling of genetic and biochemical networks, MIT Press, 2001
But only if the modeling results are reproducible
Many models have traditionally been published this way
Problems:
• Errors in printing
• Missing information
• Dependencies onimplementation
• Outright errors
• Can be a hugeeffort to recreate
Is it enough to describe the model & equations in a paper?
Is it enough to make your (software X) script available?It’s vital for good science:
• Someone with access to the same software can try to run it, understand it, verify the computational results, build on them, etc.
• Opinion: you should always do this in any case
Is it enough to make your (software X) code available?It’s vital for good science—
• Someone with access to the same software can try to run it, understand it, build on it, etc.
• Opinion: you should always do this in any case
But it’s still not ideal for communication of scientific results:
• What if they don’t have access to that software?
• And anyway, how will people find the model?
• And how will people be able to relate the model to other work?
Different tools ⇒ different interfaces & languages
Communication is better with interoperable data formats
Outli
ne
General background and motivations
Brief summary of SBML features
A selection of resources for the SBML-oriented modeler
Annotations, connections and semantics
Current and upcoming developments in community standards
Closing
SBML: a lingua fra
nca
for software
Format for representing computational models of biological processes
• Data structures + usage principles + serialization to XML
Neutral with respect to modeling framework
• E.g., ODE, stochastic systems, etc.
Development started in 2000, with first specification distributed in 2001
SBML = Systems Biology Markup Language
The process is central
• Called a “reaction” in SBML
• Participants are pools of entities (species)
Models can further include:
• Other constants & variables
• Compartments
• Explicit math
• Discontinuous events
Basic SBML concepts are fairly simple
• Unit definitions
• Annotations
Well-stirred compartments
c
n
Species pools are located in compartments
c
n
protein A protein B
gene mRNAn mRNAc
Reactions can involve any species anywhere
c
n
protein A protein B
gene mRNAn mRNAc
Reactions can cross compartment boundaries
c
n
protein A protein B
gene mRNAn mRNAc
Reaction/process rates can be (almost) arbitrary formulas
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)f4(x)
f5(x)
“Rules”: equations expressing relationships in addition to reaction sys.
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)g2(x)
.
.
.
f4(x)
f5(x)
“Events”: discontinuous actions triggered by system conditions
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)g2(x)
.
.
.
Event1: when (...condition...), do (...assignments...)
Event2: when (...condition...), do (...assignments...)
...
f4(x)
f5(x)
Annotations: machine-readable semantics and links to other resources
Event1: when (...condition...), do (...assignments...)
Event2: when (...condition...), do (...assignments...)
...
c
n
protein A protein B
gene mRNAn mRNAc
f1(x)
f2(x)
f3(x)
g1(x)g2(x)
.
.
.
f4(x)
f5(x)
“This event represents ...”
“This is identified by GO id # ...”
“This is an enzymatic reaction with EC # ...”
“This is a transport into the nucleus ...” “This compartment
represents the nucleus ...”
Today: spatially homogeneous models
• Metabolic network models
• Signaling pathway models
• Conductance-based models
• Neural models
• Pharmacokinetic/dynamics models
• Infectious diseases
Coming: SBML Level 3 packages to support other types
• E.g.: Spatially inhomogeneous models, also qualitative/logical
NATURE BIOTECHNOLOGY VOLUME 26 NUMBER 10 OCTOBER 2008 1155
of their parameters. Armed with such information, it is then possible to provide a stochastic or ordinary differential equation model of the entire metabolic network of interest. An attractive feature of metabolism, for the purposes of modeling, is that, in contrast to signaling pathways, metabo-lism is subject to direct thermodynamic and (in particular) stoichiometric constraints3. Our focus here is on the first two stages of the reconstruction process, especially as it pertains to the mapping of experimental metabo-lomics data onto metabolic network reconstructions.
Besides being an industrial workhorse for a variety of biotechnological products, S. cerevisiae is a highly developed model organism for biochemi-cal, genetic, pharmacological and post-genomic studies5. It is especially attractive because of the availability of its genome sequence6, a whole series of bar-coded deletion7,8 and other9 strains, extensive experimental ’omics data10–14 and the ability to grow it for extended periods under highly con-trolled conditions15. The very active scientific community that works on S. cerevisiae has a history of collaborative research projects that have led to substantial advances in our understanding of eukaryotic biology6,8,13,16,17. Furthermore, yeast metabolic physiology has been the subject of inten-sive study and most of the components of the yeast metabolic network are relatively well characterized. Taken together, these factors make yeast metabolism an attractive topic to test a community approach to build models for systems biology.
Several groups18–21 have reconstructed the metabolic network of yeast from genomic and literature data and made the reconstructions freely available. However, due to different approaches used to create them, as well as different interpretations of the literature, the existing reconstruc-tions have many differences. Additionally, the naming of metabolites and enzymes in the existing reconstructions was, at best, inconsistent, and there were no systematic annotations of the chemical species in the form of links to external databases that store chemical compound informa-tion. This lack of model annotation complicated the use of the models for data analysis and integration. Members of the yeast systems biology community therefore recognized that a single ‘consensus’ reconstruction and annotation of the metabolic network was highly desirable as a starting point for further investigations.
A crucial factor that enabled the building of a consensus network recon-struction is the ability to describe and exchange biochemical network
Genomic data allow the large-scale manual or semi-automated assembly of metabolic network reconstructions, which provide highly curated organism-specific knowledge bases. Although several genome-scale network reconstructions describe Saccharomyces cerevisiae metabolism, they differ in scope and content, and use different terminologies to describe the same chemical entities. This makes comparisons between them difficult and underscores the desirability of a consolidated metabolic network that collects and formalizes the ‘community knowledge’ of yeast metabolism. We describe how we have produced a consensus metabolic network reconstruction for S. cerevisiae. In drafting it, we placed special emphasis on referencing molecules to persistent databases or using database-independent forms, such as SMILES or InChI strings, as this permits their chemical structure to be represented unambiguously and in a manner that permits automated reasoning. The reconstruction is readily available via a publicly accessible database and in the Systems Biology Markup Language (http://www.comp-sys-bio.org/yeastnet). It can be maintained as a resource that serves as a common denominator for studying the systems biology of yeast. Similar strategies should benefit communities studying genome-scale metabolic networks of other organisms.
Accurate representation of biochemical, metabolic and signaling net-works by mathematical models is a central goal of integrative systems biology. This undertaking can be divided into four stages1. The first is a qualitative stage in which are listed all the reactions that are known to occur in the system or organism of interest; in the modern era, and especially for metabolic networks, these reaction lists are often derived in part from genomic annotations2,3 with curation based on literature (‘bibliomic’) data4. A second stage, again qualitative, adds known effectors, whereas the third and fourth stages—essentially amounting to molecular enzymology—include the known kinetic rate equations and the values
A consensus yeast metabolic network reconstruction obtained from a community approach to systems biologyMarkus J Herrgård1,19,20, Neil Swainston2,3,20, Paul Dobson3,4, Warwick B Dunn3,4, K Yalçin Arga5, Mikko Arvas6, Nils Blüthgen3,7, Simon Borger8, Roeland Costenoble9, Matthias Heinemann9, Michael Hucka10, Nicolas Le Novère11, Peter Li2,3, Wolfram Liebermeister8, Monica L Mo1, Ana Paula Oliveira12, Dina Petranovic12,19, Stephen Pettifer2,3, Evangelos Simeonidis3,7, Kieran Smallbone3,13, Irena Spasi!2,3, Dieter Weichart3,4, Roger Brent14, David S Broomhead3,13, Hans V Westerhoff3,7,15, Betül Kırdar5, Merja Penttilä6, Edda Klipp8, Bernhard Ø Palsson1, Uwe Sauer9, Stephen G Oliver3,16, Pedro Mendes2,3,17, Jens Nielsen12,18 & Douglas B Kell*3,4
*A list of affiliations appears at the end of the paper.
Published online 9 October 2008; doi:10.1038/nbt1492
Software can use SBO terms to help you work with models
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM (Minimum Information Requested In the Annotation of Models)
Requirements for reference correspondence
Scheme for encoding annotations
Annotations for attributing model creators & sources
Annotations for referring to external
data resources
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM (Minimum Information Requested In the Annotation of Models)
Requirements for reference correspondence
Scheme for encoding annotations
Annotations for attributing model creators & sources
Annotations for referring to external
data resources
Requirements for reference correspondence
Annotations for attributing model creators and sources
Goal: permit tracing model’s origins & people involved in its creation
Minimal info required:
• Name for the model
• Citation for a description of what is being modeled & its author
• Contact info for the model creator(s)
• Creation date & time
• Last modification date & time
• Statement of the model’s terms of distribution
- Specific terms not mandated, just a statement of the terms
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM (Minimum Information Requested In the Annotation of Models)
Requirements for reference correspondence
Scheme for encoding annotations
Annotations for attributing model creators & sources
Annotations for referring to external
data resources
Addresses 2 general areas of annotation needs:
MIRIAM is not specific to SBML
MIRIAM (Minimum Information Requested In the Annotation of Models)
Requirements for reference correspondence
Scheme for encoding annotations
Annotations for attributing model creators & sources
Annotations for referring to external
data resources
Annotations for referring to external
data resources
Annotations for external referencesGoal: link model constituents to corresponding entities in bioinformatics resources (e.g., databases, controlled vocabularies)
• Supports:
- Precise identification of model constituents
- Discovery of models that concern the same thing
- Comparison of model constituents between different models
MIRIAM approach avoids putting data content directly in the model; instead, it points at external resources that contain the knowledge.
National Institute of General Medical Sciences (USA) European Molecular Biology Laboratory (EMBL)JST ERATO Kitano Symbiotic Systems Project (Japan) (to 2003)JST ERATO-SORST Program (Japan)ELIXIR (UK)Beckman Institute, Caltech (USA)Keio University (Japan)International Joint Research Program of NEDO (Japan)Japanese Ministry of AgricultureJapanese Ministry of Educ., Culture, Sports, Science and Tech.BBSRC (UK)National Science Foundation (USA)DARPA IPTO Bio-SPICE Bio-Computation Program (USA)Air Force Office of Scientific Research (USA)STRI, University of Hertfordshire (UK)Molecular Sciences Institute (USA)