Supported by: Now open for submissions Launching May 2014 Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK) Andrew L Hufton Managing Editor Victoria Newman Editorial Curator Ruth Wilson Publisher www.nature.com/scientificdata [email protected]@ScientificData
22
Embed
Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supported by:!
Now open for submissions Launching May 2014
Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK)
Credit for Sharing Your Data Open-access Focused on Data Reuse Peer-reviewed, curated Promoting Community Data Repositories
Introducing a new content type:!
Data Descriptor!
Session 2: Publishing Data Aims of this session: to explore how data is being represented and cited in research articles; to showcase new data publishing products, and consider how the edges between articles and data are joined or defined. How can we maximize integrated utility across the different data resources used by scientists?
Session 3: Credit, Attribution, Reproducibility and Provenance Aims of this session: in an integrated information space, it is essential to have transparency on the sources and methods of scientific outputs. How do scientific articles contribute to this goal? Are they sufficiently addressing requirements, what are the most useful approaches and how might they be actioned?
Credit for Sharing Your Data Open-access Focused on Data Reuse Peer-reviewed, curated Promoting Community Data Repositories
Data Descriptor
Synthesis
Analysis
Conclusions
Interpretation
What is the sample?
What did I do to generate the data?
Where is the data?
How was the data processed?
Who did what when?
Summary of Data Descriptor
Facts
Data Descriptor
Journal article
Data Descriptor vs. traditional article!
NARRATIVE
• The data descriptor is only concerned with the facts behind the methodology of data generation/collection and processing!
• A data descriptor can be:!– submitted prior to journal article !– submitted at the same time as the journal article!– submitted after journal article!
General-purpose, configurable format, designed to support: • description of the experimental workflow, making the
annotation explicit and discoverable • provenance tracking • use community standards, such as minimal reporting
guidelines and terminologies o over 300 ‘ontologies’ and over 60 guidelines
• conversions to - a growing number of - other metadata formats
o e.g. used by EBI repositories o and as linked data
Data Descriptor – experimental metadata (CC0)!General-purpose, configurable format, designed to support: • description of the experimental workflow, making the
annotation explicit and discoverable • provenance tracking • use community standards, such as minimal reporting
guidelines and terminologies o over 300 ‘ontologies’ and over 60 guidelines
• conversions to - a growing number of - other metadata formats
o e.g. used by EBI repositories o and as linked data
ISA is implemented by several service providers running systems that are • local, institute-based
o e.g. Harvard Stem Cell Institute • project, consortium-based
o e.g. ToxBank serving a research cluster of seven EU FP7 Health projects
• global, international repositories • e.g. EBI’s MetaboLights
• and another ‘data journal, GigaScience in GigaDB
Data Descriptor – experimental metadata (CC0)!
Includes fields describing: • each study, linking to relevant sections of the
Data Descriptor article • authors’ details, including ORCID • publications • funding sources and funders’ name, via FundRef • experimental factors • study design • assays • protocols
Data Descriptor – experimental metadata (CC0)!
Data Descriptor – experimental metadata (CC0)!
Data Descriptor – experimental metadata (CC0)!
In-house curation team: • assists users to submit the structured content
via simple templates and an internal authoring tool
• performs value-added semantic annotation of the experimental metadata
For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specification:
analysis !method! script!
Data file or !record in a database!
Discover similar datasets!
SciData DD
Structured content
Structured content allows users to link, with one click, to other datasets studying the same tissue, disease, organism, or using the same experimental platform!
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
SciData DD
Structured content
Same tissue
Same organism
Same assay
Community Data
Repositories
Complementing both journal articles and data repositories ! Export to various formats
(ISA_tab, RDF, etc)
Other data-related activities at NPG!• Figure source data
- putting data behind figures/graphs - implemented at Molecular System Biology, rolled out at Nature and
progressively across all other Nature branded titles
Wang et al, Nature, 2013 doi:10.1038/nature12730
Other data-related activities at NPG!• Figure source data
- putting data behind figures/graphs - implemented at Molecular System Biology, rolled out at Nature and
progressively across all other Nature branded titles
• Extended data - expandable text and extra figures; rolled out at Nature
• Data citation - tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group - to be rolled out across all Nature branded titles and Scientific Data
• Code reproducibility - peer review, availability and reuse
• Supported community databases - criteria for selection, common list across all NPG titles