Top Banner
FAIR Data and Model Management for Systems Biology (and SOPs too!) Prof Carole Goble The University of Manchester The Software Sustainability Institute ELIXIR UK, SynBioChem Centre [email protected] tiScale Biology Network Springboard meeting, Nottingham, UK, 1 June
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Prof Carole GobleThe University of Manchester

The Software Sustainability InstituteELIXIR UK, SynBioChem Centre

[email protected]

MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015

Page 2: FAIR Data and Model Management for Systems Biology(and SOPs too!)

• Project-centric data and model management

• Respect & expects other systems

• Forged in fire of national & international projects

• PhDs/postgrads/PIs

• Context• FAIRDOM Initiative• Challenges

http://www.fair-dom.org

http://www.fairdomhub.org

Page 3: FAIR Data and Model Management for Systems Biology(and SOPs too!)

republic of science*

regulation of science

institutions libraries

*Merton’s four norms of scientific behaviour (1942)

public archivescloud services

Page 5: FAIR Data and Model Management for Systems Biology(and SOPs too!)

https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/

Adve

rt!!!

Page 6: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Publishers• Reproducibility

• New publishable assets

• New business models and services

Funders, Managers• Capitalising

• Skills

• Justification, Audit & Compliance

Page 7: FAIR Data and Model Management for Systems Biology(and SOPs too!)

UK Funder Data Policies

http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies

Page 8: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Tools, Standards,Formats, Reporting, Policies, Practices,Initiatives

Page 9: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Data

Models

SOPsconsistency, comparabilitySamples…

‘omicsimages, reaction kinetics, samples, specimens… Small: spreadsheets,

files…Big: NGS, Mass Spec, specialist repositories…

ODE, SBML, Native Matlab, PDE, Fortran, CellML…

versioning, provenance

tracking, parameter tracking,

citation tracking, links to articlesSTANDARDS

Asset Management

Page 10: FAIR Data and Model Management for Systems Biology(and SOPs too!)

public archives

cloud services 88

Public-Centric Asset Management

Page 11: FAIR Data and Model Management for Systems Biology(and SOPs too!)

public archives

cloud services

Public-Centric Asset Management

Page 12: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: Most quantitative databases provide kinetic constants for enzymes, sometimes binding constants….Little to help building quantitative descriptions, i.e. concentrations, sizes, diffusions….Exceptions: gene expression data, proteomics, metabolomics. Localisation: The average concentration of a protein in a piece of brain is of limited use (mix of tissues and subcellular compartments)

[Nicolas Le Novere, 2015]

Public-Centric Asset Management

public archives

Page 13: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIR for the Researcher

Collaborative, data/model-driven science

Publication

Local and Public Resources

Skills and Productivity

Compliance

Page 14: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Collaboration, asset management

Pop-up projects

Dynamic groups

Internal / external visibility

Page 15: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Pop-up projects

Dynamic groups

Internal / external visibility

Collaboration, asset management

Page 16: FAIR Data and Model Management for Systems Biology(and SOPs too!)

18

Page 17: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Project-Centric Asset Management

Is there any group generating kinetic data?

Who is working wit

h which

organism?

Is this data available?

What methods are been used to determine enzyme activity?

Under which experimental

conditions are my partners

working on for the measurement

of glucose concentration?

What is the provenance of the parameters for this version of the model?

What SOP was used for this sample?

Where is the validation data for this model?

• Retain results beyond a project / the PhD student

• Exchange & find assets.

• Share, disseminate and publish assets sensitively

• Consistent reporting for interpretation, interop & comparison

• Promote standardised metadata practices.

• Organise and link assets

• Reuse results

Page 18: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FindData, models, protocols, projects, peopleCatalogued and linked assetsLink studies to assetsControl sharing, versioning, gateway to scattered public/local archives

Access

InteroperateStandards (SBML, SED-ML…), vocabs, formats, idsharvesting, export, API

ReuseDownload assetsRun models with exp’mtl dataDOI citation

Page 19: FAIR Data and Model Management for Systems Biology(and SOPs too!)

The Neylon Equation

Page 20: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIRDOM Provenance

2008

2010

2014

de.NBI

2019

Page 21: FAIR Data and Model Management for Systems Biology(and SOPs too!)

SEEK:

Science CommonsWeb-based Cataloguing and Rich web interface for describing, finding, linking and promoting ongoing research and outcomes. Small files, aggregates across data archives.

openBIS:

Scaled local LIMS and analyticsExtract, Transform and Load tooling direct from the instrumentation, data analysis pipelines. Automatic archiving. Handles large data.

FAIRDOM Suite

Page 22: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Personal DataLocal StoresLIMS

ExternalDatabases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Über metadata, cataloguing

StoresSOPs, Models, data files

Page 23: FAIR Data and Model Management for Systems Biology(and SOPs too!)

NGS

Proteomics

LIMS

iPortal

BeeWM

Page 24: FAIR Data and Model Management for Systems Biology(and SOPs too!)

https://doi.org/10.15490/seek.1.investigation.56

Page 25: FAIR Data and Model Management for Systems Biology(and SOPs too!)

[Snoep, 2015]

https://doi.org/10.15490/seek.1.investigation.56

Page 26: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Standard Operating Procedures

Challenge: Machine processable SOPs

Page 27: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Modelssimulate and annotate in

browser

Page 28: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Metadata standards & templates to link studies and

link assets

Just Enough Results ModelDescribes common elements and relationships between things produced and used in experiments.

Structured descriptions for consistency and comparison

Page 29: FAIR Data and Model Management for Systems Biology(and SOPs too!)

standardsNuML

[Adapted, Le Novere]ISO/TC 276/W

G5

Data processing and Integration

Martin Golebiewski.,

HITS

Validatio

n data

Constructio

n data

Page 30: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Validatio

n data

Constructio

n data

Page 31: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIRDOMSuite

Resource

FAIRDOMHub

Self-managed, customised local installation.

Independent, self-managed private space on shared, hosted installation.

Funder CRIS

Publisher Companion Site

Managed Safe haven

FAIRDOMHub.org

Plus

Hybrids!

!

Page 32: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIRDOMSuite

Resource

FAIRDOMHub

FAIRDOM Initiative

Facilities

CommunityNetworks

ForumsWorkshops

Tools

StandardsSupport

Sustainability

de.NBI

Page 33: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Sys Bio Developers Foundry, Oct 2014 Heidelberg, Germany

EraSysAPP meeting, April 2015, Berlin, Germany

Intl Practical C

ourse in Sys

Biology, June 1-

12, 2015,

Gothenburg, Swed

en

Data Citation and Models Workshop, 14-16 Sept 2015, Rostock, Germany

Data Integration in the Life Sciences, Feb 2015, Leiden, Netherlands

Page 34: FAIR Data and Model Management for Systems Biology(and SOPs too!)

PALs

Page 35: FAIR Data and Model Management for Systems Biology(and SOPs too!)

http://seek.virtual-liver.de/

• Navigation• Single

standards at one scale

• Multi-type hosting

“To integrate the detailed knowledge that we have at the molecular level up to the functional level at tissue/organ/whole body level “

Multi-scale?Multi-silos ….

Page 36: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Handling/converting data of different levels of detail to make the model run. Representing in the SBML model the DNA bindings at the level of detail that had been measured in the experiments

Whole Cell model by Jonathan Karr(Rostock Summer School, Dagmar Waltemath)

Support for aggregating data to find the appropriate level of representation for a given model.

Karr JR, Sanghvi JC, Macklin DN, et al. A Whole-Cell Computational Model Predicts Phenotype from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.

Page 37: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: mismatches• Systems on different scales

– incompatible time scales, data may be too sparse or need to be aggregated to work with another module

• Different levels of complexity– comparing results from different modelling

approaches.

• Linking models needs thinking and standards– connecting the single standards– interfacing between the different scales– connecting (experimental/simulation) data

to models

ISO/TC 276/WG5

Data processing and Integration

Martin Golebiewski.,

HITS

Page 38: FAIR Data and Model Management for Systems Biology(and SOPs too!)
Page 39: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: model evolution

BiVeS tool: diff in versions of computational modelsProvenance, Versioning, Parameter trackingReleasing updated versions into the literature

Identifying, Interpreting, and Communicating Changes in XML-encoded Models of Biological Systems Scharm et. al. 2015, under revision at BIOINFORMATICS

Haus et al, BMCSystems Biology, 2011, 5:10

Solvent production by Clostridium acetobutylicum

[Martin Scharm]

Page 40: FAIR Data and Model Management for Systems Biology(and SOPs too!)

F1000Research Living Figures,versioned articles, in-article data manipulation

R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482

Simply data + code Can change the definition of a figure, and ultimately the journal article

Colomb J and Brembs B.

Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is]

F1000Research 2014, 3:176

Other labs can replicate the study, or contribute their data to a meta-analysis or disease model - figure automatically updates.

Data updates time-stamped.

New conclusions added via versions.

Models

Page 41: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: reproducibilitybridging from research to FAIR publishing

Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <http://identifiers.org/combine.specifications/omex.version-1> (2014)

DescribeAccessPort

Page 42: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: reproducibilitybridging from research to FAIR publishing

DepositModel simulationDifferentiated data

Page 43: FAIR Data and Model Management for Systems Biology(and SOPs too!)

ADVERT!!

Page 44: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: Samples

Descriptions

SOP-Centric

Page 45: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: Releasing

Page 46: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenge: ReleasingSysMO Projects(2009-2014)

me

ME

my team

closecolleagues

collaboratorspotential

collaborators

tradi

ng

partn

ers

publishersfu

nder

s

revi

ewer

speers

policy

makers

public

• Self-publication & Journal companionship.

• Staged & Selective Hugging & Flirting. Reciprocity.

• Tribal & Trading behaviours

• Forgetfulness, Embargos

• Resources, Benefit

• Individuals more likely to share than consortia

• Post-hoc rationalised Data/Model Cycles

Page 47: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Challenges: (meta)data wranglingOffseting curation debt

http://rightfield.org.uk

Instrumented Spreadsheets

ELN

OneStop

Platform

Harvesting

Page 48: FAIR Data and Model Management for Systems Biology(and SOPs too!)

FAIRDOM Challenge: Sustainability

Free. Like a Free Puppy.

Page 49: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Enabling multi-scale modelling in systems medicine

1. Exploit existing data for multi-scale modelling2. Develop SOPs and quality standards for systematic

collection of quantitative data and information. 3. Identify required standards and ontologies for models and

data repositories in systems medicine. 4. Develop modelling workflows for the integration of data and

models; support data management, model construction and analysis.

5. Develop mathematical formalism to analyze and compare multi-scale models (parameter estimation, sensitivity analysis, identifiability analysis and image analysis).

Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)

Social

Page 50: FAIR Data and Model Management for Systems Biology(and SOPs too!)

Carole Goble Stuart Owen

Finn Bacall

Jacky Snoep

Wolfgang Mueller

Olga Krebs Quyen Nguyen

Natalie Stanford

Katy Wolstencroft

Peter Kunzst Bernd Rinn

[email protected]@fair-dom.org

http://www.fair-dom.org http://www.fairdomhub.orghttp://seek4science.orghttp://www.rightfield.org.ukhttp://jjj.biochem.sun.ac.zahttp://sybit.net/software/openBIS

Donal FellowsAlan Williams

Rostyslav Kuzyakiv

Jakub Straszewski

Chandrasekhar Ramakrishnan

CaterinaBarillari

Norman Morrison