FAIR Data and Model Management for Systems Biology(and SOPs too!)

Post on 12-Aug-2015

50 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

Transcript

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Prof Carole GobleThe University of Manchester

The Software Sustainability InstituteELIXIR UK, SynBioChem Centre

carole.goble@manchester.ac.uk

MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015

• Project-centric data and model management

• Respect & expects other systems

• Forged in fire of national & international projects

• PhDs/postgrads/PIs

• Context• FAIRDOM Initiative• Challenges

http://www.fair-dom.org

http://www.fairdomhub.org

republic of science*

regulation of science

institutions libraries

*Merton’s four norms of scientific behaviour (1942)

public archivescloud services

https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/

Adve

rt!!!

Publishers• Reproducibility

• New publishable assets

• New business models and services

Funders, Managers• Capitalising

• Skills

• Justification, Audit & Compliance

UK Funder Data Policies

http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies

Tools, Standards,Formats, Reporting, Policies, Practices,Initiatives

Data

Models

SOPsconsistency, comparabilitySamples…

‘omicsimages, reaction kinetics, samples, specimens… Small: spreadsheets,

files…Big: NGS, Mass Spec, specialist repositories…

ODE, SBML, Native Matlab, PDE, Fortran, CellML…

versioning, provenance

tracking, parameter tracking,

citation tracking, links to articlesSTANDARDS

Asset Management

public archives

cloud services 88

Public-Centric Asset Management

public archives

cloud services

Public-Centric Asset Management

Challenge: Most quantitative databases provide kinetic constants for enzymes, sometimes binding constants….Little to help building quantitative descriptions, i.e. concentrations, sizes, diffusions….Exceptions: gene expression data, proteomics, metabolomics. Localisation: The average concentration of a protein in a piece of brain is of limited use (mix of tissues and subcellular compartments)

[Nicolas Le Novere, 2015]

Public-Centric Asset Management

public archives

FAIR for the Researcher

Collaborative, data/model-driven science

Publication

Local and Public Resources

Skills and Productivity

Compliance

Collaboration, asset management

Pop-up projects

Dynamic groups

Internal / external visibility

Pop-up projects

Dynamic groups

Internal / external visibility

Collaboration, asset management

18

Project-Centric Asset Management

Is there any group generating kinetic data?

Who is working wit

h which

organism?

Is this data available?

What methods are been used to determine enzyme activity?

Under which experimental

conditions are my partners

working on for the measurement

of glucose concentration?

What is the provenance of the parameters for this version of the model?

What SOP was used for this sample?

Where is the validation data for this model?

• Retain results beyond a project / the PhD student

• Exchange & find assets.

• Share, disseminate and publish assets sensitively

• Consistent reporting for interpretation, interop & comparison

• Promote standardised metadata practices.

• Organise and link assets

• Reuse results

FindData, models, protocols, projects, peopleCatalogued and linked assetsLink studies to assetsControl sharing, versioning, gateway to scattered public/local archives

Access

InteroperateStandards (SBML, SED-ML…), vocabs, formats, idsharvesting, export, API

ReuseDownload assetsRun models with exp’mtl dataDOI citation

The Neylon Equation

FAIRDOM Provenance

2008

2010

2014

de.NBI

2019

SEEK:

Science CommonsWeb-based Cataloguing and Rich web interface for describing, finding, linking and promoting ongoing research and outcomes. Small files, aggregates across data archives.

openBIS:

Scaled local LIMS and analyticsExtract, Transform and Load tooling direct from the instrumentation, data analysis pipelines. Automatic archiving. Handles large data.

FAIRDOM Suite

Personal DataLocal StoresLIMS

ExternalDatabases

Articles

Models

Standards

SOPs

Aggregated Commons Infrastructure

Über metadata, cataloguing

StoresSOPs, Models, data files

NGS

Proteomics

LIMS

iPortal

BeeWM

https://doi.org/10.15490/seek.1.investigation.56

[Snoep, 2015]

https://doi.org/10.15490/seek.1.investigation.56

Standard Operating Procedures

Challenge: Machine processable SOPs

Modelssimulate and annotate in

browser

Metadata standards & templates to link studies and

link assets

Just Enough Results ModelDescribes common elements and relationships between things produced and used in experiments.

Structured descriptions for consistency and comparison

standardsNuML

[Adapted, Le Novere]ISO/TC 276/W

G5

Data processing and Integration

Martin Golebiewski.,

HITS

Validatio

n data

Constructio

n data

Validatio

n data

Constructio

n data

FAIRDOMSuite

Resource

FAIRDOMHub

Self-managed, customised local installation.

Independent, self-managed private space on shared, hosted installation.

Funder CRIS

Publisher Companion Site

Managed Safe haven

FAIRDOMHub.org

Plus

Hybrids!

!

FAIRDOMSuite

Resource

FAIRDOMHub

FAIRDOM Initiative

Facilities

CommunityNetworks

ForumsWorkshops

Tools

StandardsSupport

Sustainability

de.NBI

Sys Bio Developers Foundry, Oct 2014 Heidelberg, Germany

EraSysAPP meeting, April 2015, Berlin, Germany

Intl Practical C

ourse in Sys

Biology, June 1-

12, 2015,

Gothenburg, Swed

en

Data Citation and Models Workshop, 14-16 Sept 2015, Rostock, Germany

Data Integration in the Life Sciences, Feb 2015, Leiden, Netherlands

PALs

http://seek.virtual-liver.de/

• Navigation• Single

standards at one scale

• Multi-type hosting

“To integrate the detailed knowledge that we have at the molecular level up to the functional level at tissue/organ/whole body level “

Multi-scale?Multi-silos ….

Handling/converting data of different levels of detail to make the model run. Representing in the SBML model the DNA bindings at the level of detail that had been measured in the experiments

Whole Cell model by Jonathan Karr(Rostock Summer School, Dagmar Waltemath)

Support for aggregating data to find the appropriate level of representation for a given model.

Karr JR, Sanghvi JC, Macklin DN, et al. A Whole-Cell Computational Model Predicts Phenotype from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.

Challenge: mismatches• Systems on different scales

– incompatible time scales, data may be too sparse or need to be aggregated to work with another module

• Different levels of complexity– comparing results from different modelling

approaches.

• Linking models needs thinking and standards– connecting the single standards– interfacing between the different scales– connecting (experimental/simulation) data

to models

ISO/TC 276/WG5

Data processing and Integration

Martin Golebiewski.,

HITS

Challenge: model evolution

BiVeS tool: diff in versions of computational modelsProvenance, Versioning, Parameter trackingReleasing updated versions into the literature

Identifying, Interpreting, and Communicating Changes in XML-encoded Models of Biological Systems Scharm et. al. 2015, under revision at BIOINFORMATICS

Haus et al, BMCSystems Biology, 2011, 5:10

Solvent production by Clostridium acetobutylicum

[Martin Scharm]

F1000Research Living Figures,versioned articles, in-article data manipulation

R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482

Simply data + code Can change the definition of a figure, and ultimately the journal article

Colomb J and Brembs B.

Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is]

F1000Research 2014, 3:176

Other labs can replicate the study, or contribute their data to a meta-analysis or disease model - figure automatically updates.

Data updates time-stamped.

New conclusions added via versions.

Models

Challenge: reproducibilitybridging from research to FAIR publishing

Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <http://identifiers.org/combine.specifications/omex.version-1> (2014)

DescribeAccessPort

Challenge: reproducibilitybridging from research to FAIR publishing

DepositModel simulationDifferentiated data

ADVERT!!

Challenge: Samples

Descriptions

SOP-Centric

Challenge: Releasing

Challenge: ReleasingSysMO Projects(2009-2014)

me

ME

my team

closecolleagues

collaboratorspotential

collaborators

tradi

ng

partn

ers

publishersfu

nder

s

revi

ewer

speers

policy

makers

public

• Self-publication & Journal companionship.

• Staged & Selective Hugging & Flirting. Reciprocity.

• Tribal & Trading behaviours

• Forgetfulness, Embargos

• Resources, Benefit

• Individuals more likely to share than consortia

• Post-hoc rationalised Data/Model Cycles

Challenges: (meta)data wranglingOffseting curation debt

http://rightfield.org.uk

Instrumented Spreadsheets

ELN

OneStop

Platform

Harvesting

FAIRDOM Challenge: Sustainability

Free. Like a Free Puppy.

Enabling multi-scale modelling in systems medicine

1. Exploit existing data for multi-scale modelling2. Develop SOPs and quality standards for systematic

collection of quantitative data and information. 3. Identify required standards and ontologies for models and

data repositories in systems medicine. 4. Develop modelling workflows for the integration of data and

models; support data management, model construction and analysis.

5. Develop mathematical formalism to analyze and compare multi-scale models (parameter estimation, sensitivity analysis, identifiability analysis and image analysis).

Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)

Social

Carole Goble Stuart Owen

Finn Bacall

Jacky Snoep

Wolfgang Mueller

Olga Krebs Quyen Nguyen

Natalie Stanford

Katy Wolstencroft

Peter Kunzst Bernd Rinn

fairdom@fair-dom.orgfair-dom@fair-dom.org

http://www.fair-dom.org http://www.fairdomhub.orghttp://seek4science.orghttp://www.rightfield.org.ukhttp://jjj.biochem.sun.ac.zahttp://sybit.net/software/openBIS

Donal FellowsAlan Williams

Rostyslav Kuzyakiv

Jakub Straszewski

Chandrasekhar Ramakrishnan

CaterinaBarillari

Norman Morrison

top related