Top Banner
Collect, curate, share and publish your experiments Susanna-Assunta Sansone, PhD @biosharing @isatools Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator BBSRC DTP, Oxford, 15 December, 2014 http://www.slideshare.net/SusannaSansone
52

Oxford DTP - Sansone curation tools - Dec 2014

Jul 14, 2015

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Oxford DTP - Sansone curation tools - Dec 2014

Collect, curate, share and publish your experiments

!

!

Susanna-Assunta Sansone, PhD!

!

@biosharing!@isatools!

!

Data Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

BBSRC DTP, Oxford, 15 December, 2014

http://www.slideshare.net/SusannaSansone

Page 2: Oxford DTP - Sansone curation tools - Dec 2014
Page 3: Oxford DTP - Sansone curation tools - Dec 2014

From made reproducible to born reproducible

“Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results”

Page 4: Oxford DTP - Sansone curation tools - Dec 2014

•  Problem!o  contextualize the experiment and resulting data !

!

•  Structured Component !o machine-readable element of the Data Descriptor!

!

•  Introducing solutions!o  format!

o  registry!

o  tools!

Outline

Page 5: Oxford DTP - Sansone curation tools - Dec 2014

•  We need to report sufficient information to reuse the dataset

•  We must strike a balance between depth and breadth of information

Without context data is meaningless

Page 6: Oxford DTP - Sansone curation tools - Dec 2014

Information intensive experiments

•  Not too much •  Not too little •  But just right

Page 7: Oxford DTP - Sansone curation tools - Dec 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

7

Page 8: Oxford DTP - Sansone curation tools - Dec 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

8

•  make annotation explicit and discoverable

•  structure the descriptions for consistency

•  make it machine readable

§  To make any dataset ‘FAIR’, one must have standards, tools and best practices to: •  report sufficient details •  capture all salient features of

the experimental workflow

Page 9: Oxford DTP - Sansone curation tools - Dec 2014

Structured component: key information from narrative

Seven week old C57BL/6N mice were treated with low-fat diet.

Liver was dissected out, hepatocytes prepared…

Page 10: Oxford DTP - Sansone curation tools - Dec 2014

Age value Unit

Strain name Subject of the experiment

Type of diet and experimental condition Anatomy part

Seven week old C57BL/6N mice were treated with low-fat diet.

Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Page 11: Oxford DTP - Sansone curation tools - Dec 2014

Age value Unit

Strain name Subject of the experiment

Type of diet and experimental condition Anatomy part

Seven week old C57BL/6N mice were treated with low-fat diet.

Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Type of protocol – cell preparation

Type of protocol - sample treatment

Type of protocol – liver preparation

Page 12: Oxford DTP - Sansone curation tools - Dec 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

12

Example of richly annotated, computable description

Credit to: OBI consortium

Page 13: Oxford DTP - Sansone curation tools - Dec 2014

And conversely….

LS1_C2_LD_TP2_P1! file1-fastq.gz!

Page 14: Oxford DTP - Sansone curation tools - Dec 2014

…how not to report the experimental information!

•  L!S1 ! !liver sample 1!•  C2 ! !compound 2!•  LD ! !low dose!•  TP2 ! !time point 2!

•  P1 ! !protocol 1!•  file1-fastq.gz !compressed data file for sequence ! ! !information corresponding to this ! ! !sample!

Sample name (?!)! Data file!

LS1_C2_LD_TP2_P1! file1-fastq.gz!

Page 15: Oxford DTP - Sansone curation tools - Dec 2014

Data Descriptor: two complementary components

Article or !narrative component!

(PDF and HTML) !

!!!Experimental metadata or !

structured component!(in-house curated,

machine-readable format)!

Page 16: Oxford DTP - Sansone curation tools - Dec 2014

Data Descriptor: two complementary components

Article or !narrative component!

(PDF and HTML) !

!!!Experimental metadata or !

structured component!(in-house curated,

machine-readable format)!

Page 17: Oxford DTP - Sansone curation tools - Dec 2014

Structured component enhances Methods & Data

“The Methods section should include detailed text describing the methods and procedures used in the study and assay(s), and the processing steps leading to the production of the data files, including any computational analyses…..

….. The Data Records section should be used to explain each data record associated with this work, including the repository where this information is stored, and an overview of the data files and their formats.”

Page 18: Oxford DTP - Sansone curation tools - Dec 2014

Helping authors to report the structural information

In-house editorial curator:!1. assists authors via ! - Excel templates!

- internal authoring tool!

2. performs value-added semantic annotation!3. structures the information is a machine-readable format!

analysis !method! script!

Data file or !record in a database!

Page 19: Oxford DTP - Sansone curation tools - Dec 2014

At initial submission

!"#$%&'() *+,',&,-).) *+,',&,-)/) *+,',&,-)0) *+,',&,-)1) 23'3)

!"#$%&'& ()#*&+)%,+-%.+&

/01%)&20$$%3+0".&

456&%7+),3+0".&

45689%:& ;<=>>>>>&

!"#$%&?& ()#*&+)%,+-%.+&

/01%)&20$$%3+0".&

456&%7+),3+0".&

45689%:& ;<=>>>>>&

!"#$%&.& ()#*&+)%,+-%.+&

/01%)&20$$%3+0".&

456&%7+),3+0".&

45689%:& ;<=>>>>>&

&

•  Authors provide basic input, at minimum, information on o  samples and subjects o  experimental, computational and/or observational

information, or creation of aggregations o  data outputs

•  Example for an experimental study:

Page 20: Oxford DTP - Sansone curation tools - Dec 2014

Upon acceptance

•  The curator, with the help of the authors, completes the structured description, drawing information from the narrative component, and adds o  information about the samples and subjects o  details of the experimental, computational and/or

observational information, or creation of aggregations o  details on data manipulations

•  Also performs value-added semantic tagging o  replacing free text with terms from community-defined

terminologies (controlled vocabularies or ontologies)

Page 21: Oxford DTP - Sansone curation tools - Dec 2014

Semantic tagging key information !"#$%&'()

!"#$%&'&

!"#$%&(&

!"#$%&)&

&

Page 22: Oxford DTP - Sansone curation tools - Dec 2014

Semantic tagging key information

Page 23: Oxford DTP - Sansone curation tools - Dec 2014

analysis !method! script!

Data file or !record in a database!

General-purpose, machine readable format

Designed to support: •  description of the workflow •  use community-defined

terminologies and minimal reporting guidelines o  depth of description will

vary contingent on the particular context

Page 24: Oxford DTP - Sansone curation tools - Dec 2014

Includes fields describing: •  authors’ details, including

ORCID •  publications •  funding sources and funders’

name, via FundRef •  study design •  type of assays •  type of protocols •  links to relevant sections of the

narrative component

analysis !method! script!

Data file or !record in a database!

Investigation file – overview and link to narrative

Page 25: Oxford DTP - Sansone curation tools - Dec 2014

analysis !method! script!

Data file or !record in a database!

Study file – samples / subjects description

It allows to relate samples, and their descriptions to the data files

Page 26: Oxford DTP - Sansone curation tools - Dec 2014

Assays file - from samples to data files

• Pointing to the o  location of the data files in

the external repository(s) o  name or ID of the files

Page 27: Oxford DTP - Sansone curation tools - Dec 2014

27

What does a structured component add?

•  Supplements the scientific discourse!o  natural language has a degree of ambiguity!

•  Brings clarity in reporting research methods and procedures!o  no trimming, no cooking!o  clear samples to data files links and relation to methods!

•  Provides the basis for search and discovery features!

SciData DD

Structured content SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

Same tissue

Same organism

Same assay

Community Data

Repositories

Page 28: Oxford DTP - Sansone curation tools - Dec 2014

~ 156

~ 70

~ 334

Source: BioPortal

Databases !implementing !

standards!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Progressively refine guidance to authors and reviewers

In the life sciences

Page 29: Oxford DTP - Sansone curation tools - Dec 2014

Mapping the landscape of standards and databases

Page 30: Oxford DTP - Sansone curation tools - Dec 2014

Mapping the landscape of community –developed standards, databases and data policies in the life sciences, broadly covering

biological, natural an biomedical sciences

Page 31: Oxford DTP - Sansone curation tools - Dec 2014

Including minimum information reporting requirements, or checklists to report the same core, essential information

Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’

Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another

Page 32: Oxford DTP - Sansone curation tools - Dec 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

32

Current content: •  Over 500

•  Over 600

Search and filter according to your domain of study !

Page 33: Oxford DTP - Sansone curation tools - Dec 2014

STANDARD DATABASE

Standards &databases cross-linked!

Page 34: Oxford DTP - Sansone curation tools - Dec 2014

Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;

Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented

Page 35: Oxford DTP - Sansone curation tools - Dec 2014

•  Problem!o  contextualize the experiment and resulting data !

!

•  Structured Component !o machine-readable element of the Data Descriptor!

!

•  Introducing solutions!o  format!

o  registry!

o  tools!

Outline

Page 36: Oxford DTP - Sansone curation tools - Dec 2014
Page 37: Oxford DTP - Sansone curation tools - Dec 2014
Page 38: Oxford DTP - Sansone curation tools - Dec 2014

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

ISA powers data collection, curation resources and repositories, e.g.:

Page 39: Oxford DTP - Sansone curation tools - Dec 2014
Page 40: Oxford DTP - Sansone curation tools - Dec 2014

1

Page 41: Oxford DTP - Sansone curation tools - Dec 2014

1

Create template(s) to fit the type of experiments to be described!

!Create templates detailing the steps to be reported for different investigations, complying to community standards, e.g. configuring the value(s) allowed for each field to be !•  text (with/without regular expressions),!•  ontology terms,!•  numbers etc.!!

We have ʻready to useʼ community standards compliant configurations!#

Page 42: Oxford DTP - Sansone curation tools - Dec 2014

Describe, curate your experiment using a desktop-based tool!!Report and edit the description using this tool, (also customized using the templates) with a spreadsheet like look and feel, packed with functionalities such as !•  ontology search !•  term-tagging features!•  import from spreadsheets etc…!

Page 43: Oxford DTP - Sansone curation tools - Dec 2014
Page 44: Oxford DTP - Sansone curation tools - Dec 2014

Describe, curate your experiment with geographically- distributed collaborators !!Report and edit the description of the investigation using customized Google Spreadsheets enabled with ontology search and term-tagging features.!

Page 45: Oxford DTP - Sansone curation tools - Dec 2014

2

Page 46: Oxford DTP - Sansone curation tools - Dec 2014

3

Page 47: Oxford DTP - Sansone curation tools - Dec 2014

4

Page 48: Oxford DTP - Sansone curation tools - Dec 2014

transcriptomics proteomics genomics

Page 49: Oxford DTP - Sansone curation tools - Dec 2014

5

Page 50: Oxford DTP - Sansone curation tools - Dec 2014

6

Page 51: Oxford DTP - Sansone curation tools - Dec 2014

•  Assists in the curation and management of experimental metadata at source!o  Common, structured representation of experimental information that

transcends individual biological and technological domains!

o  Deals with studies with one or a combination of assays!

•  Can be ʻconfiguredʼ to implement (several) community standards, facilitating their uptake!

•  Elements can be plugged into existing tools/resources!•  Facilitates data sharing, use of existing analysis tools and

submission to!o  EBI public repositories!!

o  data journals!✔

Page 52: Oxford DTP - Sansone curation tools - Dec 2014

Acknowledgements!

Visit nature.com/scientificdata

Email [email protected]

Tweet @ScientificData

Honorary Academic Editor Susanna-Assunta Sansone, PhD

Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar

Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators

Philippe Rocca-Serra, PhD

Alejandra Gonzalez-Beltran, PhD

Eamonn Maguire

Milo Thurston, PhD

and Advisory Boards and Collaborators