FAIR Data and Model Management for Systems Biology (and SOPs too!) Prof Carole Goble The University of Manchester The Software Sustainability Institute ELIXIR UK, SynBioChem Centre [email protected]tiScale Biology Network Springboard meeting, Nottingham, UK, 1 June
50
Embed
FAIR Data and Model Management for Systems Biology(and SOPs too!)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FAIR Data and Model Management for Systems Biology(and SOPs too!)
Prof Carole GobleThe University of Manchester
The Software Sustainability InstituteELIXIR UK, SynBioChem Centre
files…Big: NGS, Mass Spec, specialist repositories…
ODE, SBML, Native Matlab, PDE, Fortran, CellML…
versioning, provenance
tracking, parameter tracking,
citation tracking, links to articlesSTANDARDS
Asset Management
public archives
cloud services 88
Public-Centric Asset Management
public archives
cloud services
Public-Centric Asset Management
Challenge: Most quantitative databases provide kinetic constants for enzymes, sometimes binding constants….Little to help building quantitative descriptions, i.e. concentrations, sizes, diffusions….Exceptions: gene expression data, proteomics, metabolomics. Localisation: The average concentration of a protein in a piece of brain is of limited use (mix of tissues and subcellular compartments)
[Nicolas Le Novere, 2015]
Public-Centric Asset Management
public archives
FAIR for the Researcher
Collaborative, data/model-driven science
Publication
Local and Public Resources
Skills and Productivity
Compliance
Collaboration, asset management
Pop-up projects
Dynamic groups
Internal / external visibility
Pop-up projects
Dynamic groups
Internal / external visibility
Collaboration, asset management
18
Project-Centric Asset Management
Is there any group generating kinetic data?
Who is working wit
h which
organism?
Is this data available?
What methods are been used to determine enzyme activity?
Under which experimental
conditions are my partners
working on for the measurement
of glucose concentration?
What is the provenance of the parameters for this version of the model?
What SOP was used for this sample?
Where is the validation data for this model?
• Retain results beyond a project / the PhD student
• Exchange & find assets.
• Share, disseminate and publish assets sensitively
• Consistent reporting for interpretation, interop & comparison
• Promote standardised metadata practices.
• Organise and link assets
• Reuse results
FindData, models, protocols, projects, peopleCatalogued and linked assetsLink studies to assetsControl sharing, versioning, gateway to scattered public/local archives
Access
InteroperateStandards (SBML, SED-ML…), vocabs, formats, idsharvesting, export, API
ReuseDownload assetsRun models with exp’mtl dataDOI citation
The Neylon Equation
FAIRDOM Provenance
2008
2010
2014
de.NBI
2019
SEEK:
Science CommonsWeb-based Cataloguing and Rich web interface for describing, finding, linking and promoting ongoing research and outcomes. Small files, aggregates across data archives.
openBIS:
Scaled local LIMS and analyticsExtract, Transform and Load tooling direct from the instrumentation, data analysis pipelines. Automatic archiving. Handles large data.
FAIRDOM Suite
Personal DataLocal StoresLIMS
ExternalDatabases
Articles
Models
Standards
SOPs
Aggregated Commons Infrastructure
Über metadata, cataloguing
StoresSOPs, Models, data files
NGS
Proteomics
LIMS
iPortal
BeeWM
https://doi.org/10.15490/seek.1.investigation.56
[Snoep, 2015]
https://doi.org/10.15490/seek.1.investigation.56
Standard Operating Procedures
Challenge: Machine processable SOPs
Modelssimulate and annotate in
browser
Metadata standards & templates to link studies and
link assets
Just Enough Results ModelDescribes common elements and relationships between things produced and used in experiments.
Structured descriptions for consistency and comparison
standardsNuML
[Adapted, Le Novere]ISO/TC 276/W
G5
Data processing and Integration
Martin Golebiewski.,
HITS
Validatio
n data
Constructio
n data
Validatio
n data
Constructio
n data
FAIRDOMSuite
Resource
FAIRDOMHub
Self-managed, customised local installation.
Independent, self-managed private space on shared, hosted installation.
Funder CRIS
Publisher Companion Site
Managed Safe haven
FAIRDOMHub.org
Plus
Hybrids!
!
FAIRDOMSuite
Resource
FAIRDOMHub
FAIRDOM Initiative
Facilities
CommunityNetworks
ForumsWorkshops
Tools
StandardsSupport
Sustainability
de.NBI
Sys Bio Developers Foundry, Oct 2014 Heidelberg, Germany
EraSysAPP meeting, April 2015, Berlin, Germany
Intl Practical C
ourse in Sys
Biology, June 1-
12, 2015,
Gothenburg, Swed
en
Data Citation and Models Workshop, 14-16 Sept 2015, Rostock, Germany
Data Integration in the Life Sciences, Feb 2015, Leiden, Netherlands
PALs
http://seek.virtual-liver.de/
• Navigation• Single
standards at one scale
• Multi-type hosting
“To integrate the detailed knowledge that we have at the molecular level up to the functional level at tissue/organ/whole body level “
Multi-scale?Multi-silos ….
Handling/converting data of different levels of detail to make the model run. Representing in the SBML model the DNA bindings at the level of detail that had been measured in the experiments
Whole Cell model by Jonathan Karr(Rostock Summer School, Dagmar Waltemath)
Support for aggregating data to find the appropriate level of representation for a given model.
Karr JR, Sanghvi JC, Macklin DN, et al. A Whole-Cell Computational Model Predicts Phenotype from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.
Challenge: mismatches• Systems on different scales
– incompatible time scales, data may be too sparse or need to be aggregated to work with another module
• Different levels of complexity– comparing results from different modelling
approaches.
• Linking models needs thinking and standards– connecting the single standards– interfacing between the different scales– connecting (experimental/simulation) data
to models
ISO/TC 276/WG5
Data processing and Integration
Martin Golebiewski.,
HITS
Challenge: model evolution
BiVeS tool: diff in versions of computational modelsProvenance, Versioning, Parameter trackingReleasing updated versions into the literature
Identifying, Interpreting, and Communicating Changes in XML-encoded Models of Biological Systems Scharm et. al. 2015, under revision at BIOINFORMATICS
Haus et al, BMCSystems Biology, 2011, 5:10
Solvent production by Clostridium acetobutylicum
[Martin Scharm]
F1000Research Living Figures,versioned articles, in-article data manipulation
R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482
Simply data + code Can change the definition of a figure, and ultimately the journal article
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can replicate the study, or contribute their data to a meta-analysis or disease model - figure automatically updates.