Top Banner
Data Papers and their applications: examples from Nature Publishing Group and Ubiquity Press SciDataCon2014, 2-5 November, 2014 1. Introduction 2. Anatomy of a data paper - cases studies from specific journals Nature Publishing Group - Scientific Data, Susanna-Assunta Sansone Ubiquity Press - Open Health Data, Brian Hole 3. Feedback and discussion
51

SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Jun 14, 2015

Download

Education

Part of the SciDataCon14 workshop on "Data Papers and their applications" run by myself and Brian Hole to help attendees understand current data-publishing journals and trends and help them understand the editorial processes on NPG's Scientific Data and Ubiquity's Open Health Data.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Papers and their applications:!examples from !

Nature Publishing Group and Ubiquity Press!

SciDataCon2014, 2-5 November, 2014

1.  Introduction!2.  Anatomy of a data paper - cases studies from

specific journals!•  Nature Publishing Group - Scientific Data,

Susanna-Assunta Sansone!•  Ubiquity Press - Open Health Data,

Brian Hole!3. Feedback and discussion!

Page 2: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

!

Introduction!

The role of publishers and data papers !!

Susanna-Assunta Sansone, PhD!!!

@biosharing!@isatools!

@scientificdata!!

SciDataCon2014, 2-5 November, 2014

Page 3: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/

Credit to:

Page 4: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

A community mobilization for “openness”

image by Greg Emmerich

http://discovery.urlibraries.org/ https://okfn.org

•  Open data is a means to do better science more efficiently!

•  Licenses, copyright and IP are legal barriers to data sharing and reuse!o  Licenses are for asserting rights;

waivers are for giving them up, maximising potential for data reuse, integration and discovery of new knowledge!

•  Creative Commons CC0!o  interoperability: CC0 is human and

machine-readable!o  universality: CC0 is global and

universal and widely recognized!o  simplicity: no need for humans to

make, and respond to, individual data requests!

http://pantonprinciples.org

http://opendefinition.org/licenses/

https://www.copyrightsworld.com https://creativecommons.org

Page 5: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Open access is not enough on its own

http://www.theguardian.com/higher-education-network/blog/2014/jun/26

If your research has been funded by the taxpayer, there's a good chance you'll be encouraged to publish your results on an open access basis….. This final article makes publicly available the hypotheses, interpretations and conclusions of your research. But what about the data that led you to those results and conclusions?

Page 6: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Also open data is not always enough

http://www.theguardian.com/higher-education-network/blog/2014/jun/26

So data that is in theory open and free to access!•  may still be hard to get hold of!•  it may not have been stored or cited

in the appropriate manner!•  it may not be interoperable with

related data because it is not formatted appropriately; or!

•  it may not be reusable because it may not contain enough information for others to understand it!

Page 7: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Movement for FAIR data in life and medical sciences

http://bd2k.nih.gov/workshops.html#ADDS

Page 8: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

8

Because, in all fairness, not much data is FAIR!

Page 9: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data unavailability and incomplete annotations

Page 10: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Credit to: Iain Hrynaszkiewicz

Benefits and barriers to data sharing

Benefits! Barriers!•  Reduction of error and fraud!•  Increased return on investment in

research!•  Compliance with funder and

journal mandates!•  Reduce duplication and bias!•  Reproduction/validation of

research!•  Testing additional hypotheses!•  Use for teaching!•  Integration with other data sets!•  Increased citations !

•  Concerns over inappropriate reuse!•  Limited time/resources!•  Costs associated with data sharing!•  Human privacy concerns!•  Unclear ownership of data/

authority to release data!•  Lack of academic incentives/

recognition!•  Lack of repositories or lack of

awareness of repositories!•  Protecting commercially sensitive

information !

Page 11: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Responsibilities lie across several stakeholder groups

Page 12: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Role of publishers as “agents of change”

•  Data has to become an integral part of scholarly communications!

!

•  Publishers occupy a leverage point in this process!

Page 13: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

•  Credit!•  Unpublished data!

•  Peer review focus!•  Value of data vs. analysis!

•  Discoverability!

•  Reusability!•  Narrative/context!

•  “Intelligently open data”!

The role of data journals/articles

Credit to: Iain Hrynaszkiewicz

Page 14: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

•  Policies on access (to data, code, reagents etc.)!o  Supporting funder & community needs!

•  Format and amount of content!o  Methodological details, supplementary info, data integration and

links to repositories!

•  Licensing for reuse!•  Incentives to share!o  Data citations!

o  Data journals and articles!

•  Quality assurance through peer review!

Publishers and data/reproducibility

Credit to: Iain Hrynaszkiewicz

Page 15: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Human Genome 2001 62 Pages, 150 Authors,

49 Figure, 27 tables

Encode Project 2012 30 papers, 3 Journals

Nature Publishing Group: the changing landscape

Page 16: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Credit to: Iain Hrynaszkiewicz

2013

Page 17: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Wang et al, Nature, 2013 doi:10.1038/nature12730

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

Page 18: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

•  Extended data o  expandable text and extra figures; rolled out at Nature

Page 19: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data/reproducibility at NPG Some important recent events 2013-2014

•  Figure source data o  putting data behind figures/graphs o  rolled out at Nature and progressively across all other Nature branded

titles

•  Extended data o  expandable text and extra figures; rolled out at Nature

•  Data citation o  tackling both styling and format; monitoring community developments,

such the Data Citation Synthesis Group o  to be rolled out across all Nature branded titles and Scientific Data

•  Code reproducibility o  peer review, availability and reuse

•  NPG’s Linked Data release – CC0 •  A new data publication platform:

Page 20: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

From made reproducible to born reproducible

“Reproducing the method took several months of effort, and required using new versions and new software that posed

challenges to reconstructing and validating the results”

Page 21: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data journals everywhere?

Credit to: Iain Hrynaszkiewicz

Page 22: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

!

!

!

!

!

!

!

!

!

!

@scientificdata!

Susanna-Assunta Sansone, PhD!

@biosharing!@isatools!

!!

SciDataCon2014, 2-5 November, 2014

A new open-access, online-only publication for descriptions of scientifically valuable datasets !

Page 23: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

•  Get Credit for Sharing Your Data •  Publications will be listed in the major indexes and will be citeable •  Focused on Data Reuse •  All the information others need to reuse the data; no interpretative

analysis or hypothesis testing

•  Open-access •  Authors select from three Creative Commons licences for the main •  Data Descriptor. Each publication supported by curated CC0

metadata

•  Peer-reviewed •  Rigorous peer-review managed by our Editorial Board of academic

researchers ensures data quality and standards

•  Promoting Community Data Repositories •  Data stored in community data repositories

Page 24: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Descriptor

Synthesis

Analysis

Conclusions

Interpretation

What is the sample?

What did I do to generate the data?

Where is the data?

How was the data processed?

Who did what when?

Summary of Data Descriptor

Facts

Data Descriptor

Journal article

NARRATIVE

Introducing a new content type: the Data Descriptor •  Designed to make data more discoverable, interpretable and

reusable!•  Concerned with the facts behind the methodology

of data generation/collection and processing!•  Complements a journal article!

Page 25: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Descriptor: narrative and structure!

!!!

Experimental metadata or !structured component!

(in-house curated, machine-readable formats)!

Article or !narrative component!

(PDF and HTML) !

Page 26: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

In traditional publications this information is not provided in a sufficiently detailed manner

However this information is essential for understanding, reusing, and reproducing datasets

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Page 27: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Page 28: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Descriptor: narrative!

Sections:!•  Title!•  Abstract!•  Background & Summary!•  Methods!•  Technical Validation!•  Data Records!•  Usage Notes !•  Figures & Tables !•  References!•  Data Citations!!

Focus on data reuse!Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group

Page 29: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

General-purpose, machine-readable format, designed to support: •  description of the experimental

workflow •  explicit and discoverable

annotations •  provenance tracking •  use community-defined

minimal reporting guidelines and terminologies

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure - content !

Page 30: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Includes fields describing: •  each study, linking to relevant

sections of the Data Descriptor article

•  authors’ details, including ORCID •  publications •  funding sources and funders’ name,

via FundRef •  experimental factors •  study design •  assays •  protocols

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure - content !

Page 31: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Descriptor: structure - content !

It allows to relate samples, and their descriptions to the data files

Page 32: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

In-house editorial curator:!•  assists users to submit the structured

content via simple templates and an internal authoring tool!

•  performs value-added semantic annotation of the experimental metadata!

For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification:!

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure - content !

Page 33: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Green: author; Purple: repository; Blue: SciData; Red: production

Workflow overview!

Page 34: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Collect Data!

Follow-up experiments!

Publish Findings!

Publish Data!

Scientific Data’s prior publication policy with other NPG journals protects your ability to publish the screen data and the hits later

Publish your data early!

Credit to: Andrew Hufton

Page 35: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Hao et al.: Environmental!

Data sets from the Global Integrated Drought Monitoring and Prediction System (GIDMaPS), which provides drought information based on multiple drought indicators

8 citations

Page 36: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Hao et al.: Environmental!New Dataset •  Data in figshare •  Code in figshare

8 citations

Page 37: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Hao et al.: Environmental!New Dataset •  Data in figshare •  Code in figshare •  Cited in Science

8 citations

Page 38: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Collect Data!

Follow-up experiments!

Publish Findings!

Submit Data!

Hold publication!

Scientific Data will hold a Data Descriptor publication that has been accepted for publication, while your other related research

publications clear peer review Credit to:

Andrew Hufton

Or your data and findings simultaneously/after!

Page 39: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Messina et al.: Epidemiology!

The most comprehensive geographic collection of human dengue virus occurrence data (1960 -2012), linked to point or polygon locations, derived from peer-reviewed literature and case reports as well as informal online sources

4 citations

Page 40: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Messina et al.: Epidemiology! 4 citations

Associated Nature Article •  Data in figshare

!!!!!!!!Scientific hypotheses:!Synthesis!Analysis!Conclusions!

Methods and technical analyses supporting the quality of the measurements:!What did I do to generate the data?!How was the data processed?!Where is the data?!Who did what when!

Page 41: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Value added component integrated in a growing ecosystem!

Res

earc

h pa

pers

D

ata

reco

rds

Dat

a D

escr

ipto

rs

Page 42: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

A web-based, curated and searchable portal works to ensure the

standards and databases are registered, informative and discoverable and accessible, monitoring the development and evolution of standards,

their use in databases and the adoption of both in data policies.

Over 500 Over 600

Progressively refine the guidance to authors !

Page 43: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

24

3

10 4

1

4

3

4

DNA and protein sequenceFunctional genomicsGenetic association and genome variationMetagenomicsMolecular interactionsOrganism- or disease-specificProteomicsTaxonomy and species diversityTraces and sequencing reads

“Omics” is emphasized among basic life-sciences repositories

•  We currently recognize over 60 public data repositories, and provide advice on the best place for authors to archive their data!

•  We have integrated systems with both:!!!

Helping authors find the right place for the data!

Page 44: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Big  data  |  CSE  2014  44  

Repositories criteria!1.  Broad support and recognition within their scientific community !2.  Ensure long-term persistence and preservation of datasets!3.  Provide expert curation !

4.  Implement relevant, community-endorsed reporting requirements !Progressively monitor this via !

5.  Provide for confidential review of submitted datasets !

6.  Provide stable identifiers for submitted datasets !7.  Allow public access to data without unnecessary restrictions !

Page 45: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Citations of and links to data files - databases!

Page 46: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!

!

•  Experimental rigour and technical data quality!o  Methodologically sound!o  Technical validation experiments and statistical analyses!o  Depth, coverage, size, and/or completeness of data sufficient for the types

of applications!•  Completeness of the description!

o  Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!

o  Compliance with relevant minimum information or reporting standards!•  Integrity of the data files and repository record!

o  Data files match the descriptions in the Data Descriptor!o  Deposited in the most appropriate available data repository!

Peer review process focused on quality and reuse!

Page 47: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

•  Neuroscience, ecology, epidemiology, environmental science, functional genomics, metabolomics, toxicology etc.!

•  New previously published individual datasets, curated aggregation and citizen science:!o  a fuller, more in-depth look at the data processing steps, supported by

additional data files and code from each step!o  additional tutorial-like information for scientists interested in reusing or

integrating the data with their own!•  Datasets in figshare, Dryad and domain specific databases!•  Code deposited in figshare and GitHub!•  First collection:!

47

Current content is diverse - bimonthly releases !

Page 48: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data
Page 49: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data: the primary datasets resides in public repositories. Partnering with FigShare and Dryad, which are both CC0!

Data Descriptor - structured component (ISA-Tab): as NPG has already done with its existing Linked Data Portal, the metadata about data descriptors in Scientific Data is CC0!Data Descriptor - narrative component: describing the methodology of data generation/collection and processing is licensed under either of the following, by author choice:

Open Access – APC supported!

OA Article processing charges: $1,000 USD / £650 GBP / €750 for each accepted article

Page 50: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Supported by:!

Advisory Panel including senior researchers, funders, librarians and curators Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta, UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute, USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ● Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ● Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ● Wellcome Trust, UK ● Wolfram Horstmann ● Göttingen State and University Library, Germany ● Piero Carninci ● RIKEN Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ● Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter Institute, USA ● Caroline Shamu ● Harvard Medical School, USA

Susanna-Assunta Sansone Honorary Academic Editor (University of Oxford, UK)

Andrew L Hufton Managing Editor

Varsha Khodiyar Editorial Curator

Iain Hrynaszkiewicz Publisher

An open access, peer-reviewed publication for descriptions of scientifically valuable datasets!

Launched May 2014

Page 51: SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific Data

Data Papers and their applications:!examples from !

Nature Publishing Group and Ubiquity Press!

SciDataCon2014, 2-5 November, 2014

Feedback and discussion!•  Based on what you have heard today, how well do

these journals fit with your/researchers at your instituteʼs publication and data management workflow? !

•  What are the benefits to data publication? !•  What are the risks/barriers?!•  What can publishers/journal do to incentivise data

publication?!