Top Banner
Supported by: Now open for submissions Launching May 2014 Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta National Institutes of Health, USA Mark Thorley Natural Environment Research Council, UK Patricia Cruse University of California, USA Susan Gregurick Office of Biological and Environmental Research, Department of Energy, USA Ioannis Xenarios Swiss Institute of Bioinformatics, Switzerland Chris Bowler IBENS, France Mark Forster Syngenta, UK Anthony Rowe Johnson & Johnson, USA Stephen Chanock National Cancer Institute, USA Weida Tong National Center for Toxicological Research, FDA, USA Albert J. R. Heck Utrecht University, The Netherlands Johanna McEntyre EMBL-EBI, European Bioinformatics Institute, UK Simon Hodson CODATA, France Joseph R. Ecker Howard Hughes Medical Institute & Salk Institute, USA Stephen Friend Sage Bionetworks, USA Jessica Tenenbaum Duke Translational Medicine Institute, USA Anne-Claude Gavin EMBL, Germany David Carr Wellcome Trust, UK Wolfram Horstmann University of Oxford, UK Piero Carninci RIKEN Omics Science Center, Japan Pascale Gaudet Swiss Institute of Bioinformatics, Switzerland Judith A. Blake The Jackson Laboratory, USA Richard H. Scheuermann J. Craig Venter Institute, USA Caroline Shamu Harvard Medical School, USA Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK) Andrew L Hufton Managing Editor Victoria Newman Editorial Curator Ruth Wilson Publisher www.nature.com/scientificdata [email protected] @ScientificData
22

Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Jan 27, 2015

Download

Technology

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Supported by:!

Now open for submissions Launching May 2014

Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA

Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK)

Andrew L Hufton Managing Editor

Victoria Newman Editorial Curator

Ruth Wilson Publisher

www.nature.com/scientificdata [email protected] @ScientificData

Page 2: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Introducing a new content type:!

Data Descriptor!

Credit for Sharing Your Data Open-access Focused on Data Reuse Peer-reviewed, curated Promoting Community Data Repositories

Page 3: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Introducing a new content type:!

Data Descriptor!

Session 2: Publishing Data Aims of this session: to explore how data is being represented and cited in research articles; to showcase new data publishing products, and consider how the edges between articles and data are joined or defined. How can we maximize integrated utility across the different data resources used by scientists?

Session 3: Credit, Attribution, Reproducibility and Provenance Aims of this session: in an integrated information space, it is essential to have transparency on the sources and methods of scientific outputs. How do scientific articles contribute to this goal? Are they sufficiently addressing requirements, what are the most useful approaches and how might they be actioned?

Credit for Sharing Your Data Open-access Focused on Data Reuse Peer-reviewed, curated Promoting Community Data Repositories

Page 4: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor

Synthesis

Analysis

Conclusions

Interpretation

What is the sample?

What did I do to generate the data?

Where is the data?

How was the data processed?

Who did what when?

Summary of Data Descriptor

Facts

Data Descriptor

Journal article

Data Descriptor vs. traditional article!

NARRATIVE

•  The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing!

•  A data descriptor can be:!–  submitted prior to journal article !–  submitted at the same time as the journal article!–  submitted after journal article!

Page 5: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Two sample Data Descriptors now online!

Page 6: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor has 2 components!

Article or

narrative component (PDF and HTML)

Experimental metadata or

structured component (in-house curated, machine-readable formats)

Supported by

Page 7: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor - article !

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 8: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

In traditional publications this information is not provided in a sufficiently detailed manner

However this information is essential for understanding, reusing, and reproducing datasets

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Data Descriptor - article !

Page 9: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor - article !

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 10: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor - article !

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 11: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor - article !

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 12: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

funded by:

Page 13: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

funded by:

General-purpose, configurable format, designed to support: •  description of the experimental workflow, making the

annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting

guidelines and terminologies o  over 300 ‘ontologies’ and over 60 guidelines

•  conversions to - a growing number of - other metadata formats

o  e.g. used by EBI repositories o  and as linked data

Page 14: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!General-purpose, configurable format, designed to support: •  description of the experimental workflow, making the

annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting

guidelines and terminologies o  over 300 ‘ontologies’ and over 60 guidelines

•  conversions to - a growing number of - other metadata formats

o  e.g. used by EBI repositories o  and as linked data

ISA is implemented by several service providers running systems that are •  local, institute-based

o  e.g. Harvard Stem Cell Institute •  project, consortium-based

o  e.g. ToxBank serving a research cluster of seven EU FP7 Health projects

•  global, international repositories •  e.g. EBI’s MetaboLights

•  and another ‘data journal, GigaScience in GigaDB

Page 15: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

Includes fields describing: •  each study, linking to relevant sections of the

Data Descriptor article •  authors’ details, including ORCID •  publications •  funding sources and funders’ name, via FundRef •  experimental factors •  study design •  assays •  protocols

Page 16: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

Page 17: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

Page 18: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Data Descriptor – experimental metadata (CC0)!

In-house curation team: •  assists users to submit the structured content

via simple templates and an internal authoring tool

•  performs value-added semantic annotation of the experimental metadata

For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specification:

analysis !method! script!

Data file or !record in a database!

Page 19: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Discover similar datasets!

SciData DD

Structured content

Structured content allows users to link, with one click, to other datasets studying the same tissue, disease, organism, or using the same experimental platform!

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

Same tissue

Same organism

Same assay

Community Data

Repositories

Page 20: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Complementing both journal articles and data repositories ! Export to various formats

(ISA_tab, RDF, etc)

Page 21: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Other data-related activities at NPG!•  Figure source data

-  putting data behind figures/graphs -  implemented at Molecular System Biology, rolled out at Nature and

progressively across all other Nature branded titles

Wang et al, Nature, 2013 doi:10.1038/nature12730

Page 22: Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013

Other data-related activities at NPG!•  Figure source data

-  putting data behind figures/graphs -  implemented at Molecular System Biology, rolled out at Nature and

progressively across all other Nature branded titles

•  Extended data -  expandable text and extra figures; rolled out at Nature

•  Data citation -  tackling both styling and format; monitoring community developments,

such the Data Citation Synthesis Group -  to be rolled out across all Nature branded titles and Scientific Data

•  Code reproducibility -  peer review, availability and reuse

•  Supported community databases -  criteria for selection, common list across all NPG titles

•  NPG’s Linked Data release – CC0