Top Banner
Consultant, Honorary Academic Editor Associate Director, Principal Investigator Better Data = Better Science Susanna-Assunta Sansone, PhD @biosharing @isatools NC3Rs Publication Bias Workshop, London, 24-25 February, 2015 http://www.slideshare.net/SusannaSansone
33

NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Jul 14, 2015

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Consultant, Honorary Academic Editor

Associate Director, Principal Investigator

!

Better Data = Better Science !

Susanna-Assunta Sansone, PhD!!!

@biosharing!@isatools!

!

NC3Rs Publication Bias Workshop, London, 24-25 February, 2015

http://www.slideshare.net/SusannaSansone

Page 2: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Plagued by selective reporting of data and methods

Page 3: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Plagued by selective reporting of data and methods

Why? For example:

•  Researchers still lack of or insufficient motivations

o  Focus on big discovery and impact; because they “have to”

•  Hypothesis-confirming results get prioritized

o  Difficulties with reviews of other results

•  Agreements, disagreements and timing

o  Unclear or lack of data sharing agreements and timing of disclosure

•  Loose requirements and monitoring by journals and funders

o  Publish and release just enough; keep the rest, move to next grant

Page 4: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Are open data and methods understandable, reusable?

Page 5: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Are open data and methods understandable, reusable?

Not always…

•  Outputs are multi-dimensional, diverse, not always well cited / stored

•  Software, codes, workflows etc.; hard(er) to get hold of

•  Data often distributed and fragmented to fit (siloed) databases

o  Not contain enough information for others to understand it

•  Uneven level of details and annotation across different databases

o  Specialized, generalist, public and institutional

•  Data curation activities are perceived as time consuming

o  Collection and harmonization of detailed methods and experimental

steps is done/rushed at publication stage

Page 6: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Worldwide movement for FAIR data

Page 7: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Role of data papers / data journals

•  Incentive, credit for sharing!•  Data-focused peer review!•  Value of data vs. analysis, results!•  Support of the FAIR concept!

Page 8: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

market research (2011)

•  What do researchers want from a data publications? o  96% - increased visibility and discovery o  95% - increased usability of their research data o  93% - credit mechanism for deposit of data o  80% - peer review of content/datasets

Respondent characteristics 387 respondents (329 active researchers Physics (24%) Earth and environmental science (21%) Biology (20%) Chemistry (19%) Others (16%)

Page 9: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Because of importance of formal publications in the academic !

incentive structure!

Publishers occupy a leverage point

Page 10: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Role of publishers as “agents of change”

Page 11: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

"!!

Helping you publish, discover and reuse research data

Credit for sharing your data

Focused on reuse and reproducibility

Peer reviewed, curated

Promoting community data and code repositories

Open Access

•  Currently covering life, natural and environmental sciences!

•  Big and small data!o  power of small data are in their aggregation and

integration with other datasets!

•  New and previously published individual datasets, curated collections and citizen science!

o  a fuller, more in-depth look at the data processing steps, additional data files, codes etc!

o  tutorial-like information for scientists interested in reusing or integrating the data with their own!

Page 12: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Methods and technical analyses supporting the quality of the measurements:"What did I do to generate the data?"How was the data processed?"Where is the data?"Who did what when"How can the data be used or reused?"

Introducing a new content type: Data Descriptor

Designed to make data more FAIR Focused mainly on: •  Methods •  Technical Validation •  Data Records •  Usage Notes

Page 13: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

""""""""Scientific hypotheses:"Synthesis"Analysis"Conclusions"

Methods and technical analyses supporting the quality of the measurements:"What did I do to generate the data?"How was the data processed?"Where is the data?"Who did what when"How can the data be used or reused?"

Relation with traditional article - content

Page 14: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

AFTER: expand on your research articles, adding further information for reuse of the data

AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)

OR BEFORE

Relation with traditional article - time

Publish Data!

Page 15: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

"""""""""

Code in GitHub

"""""""""Data in OpenfMRI

Share your data, get credited and cited

Page 16: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Evaluation is not be based on the perceived impact !or novelty of the findings or size of the data!

!

•  Experimental rigour and technical data quality!o  Methodologically sound!o  Technical validation experiments and statistical analyses!o  Depth, coverage, size, and/or completeness of data sufficient for the types

of applications!•  Completeness of the description!

o  Sufficient details to allow others to reproduce the results, reuse or integrate it with other data!

o  Compliance with relevant minimum information or reporting standards!•  Integrity of the data files and repository record!

o  Data files match the descriptions in the Data Descriptor!o  Deposited in the most appropriate available databases!

Peer review process focused on quality and reuse!

Page 17: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

"""

Experimental metadata or "structured component"

(in-house curated, machine-readable formats)"

Article or "narrative component"

(PDF and HTML) !

Data Descriptor: narrative and structure

Page 18: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Sections:!•  Title"•  Abstract"•  Background & Summary"•  Methods"•  Technical Validation"•  Data Records"•  Usage Notes "•  Figures & Tables "•  References"•  Data Citations"!

Focus on data reuse"Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.!Does not contain tests of new scientific hypotheses!

Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group

Data Descriptor: narrative

Page 19: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

In-house editorial curator assists authors via !•  Excel spreadsheet

templates"•  internal authoring tool!

to create the structured component, also performing value-added semantic annotation

analysis !method! script!

Data file or !record in a database!

Data Descriptor: structure (CC0)

Page 20: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Because we do not want cryptic experimental info, e.g.:

LS1_C2_LD_TP2_P1! file1-fastq.gz!

Page 21: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

…how not to report the experimental information!

•  L!S1 ! !liver sample 1!•  C2 ! !compound 2!•  LD ! !low dose!•  TP2 ! !time point 2!

•  P1 ! !protocol 1!•  file1-fastq.gz !compressed data file for sequence !! ! !information corresponding to this sample!

Sample name (?!)" Data file"

LS1_C2_LD_TP2_P1! file1-fastq.gz!

Page 22: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Structured component: key information from narrative

Seven week old C57BL/6N mice were treated with low-fat diet.

Liver was dissected out, hepatocytes prepared…

Page 23: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Age value Unit

Strain name Subject of the experiment

Type of diet and experimental condition Anatomy part

Seven week old C57BL/6N mice were treated with low-fat diet.

Liver was dissected out, hepatocytes prepared …

From natural language to ‘computable’ concepts

Type of protocol – cell preparation

Type of protocol - sample treatment

Type of protocol – liver preparation

Page 24: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Semantic tagging key information !"#$%&'()

!"#$%&'&

!"#$%&(&

!"#$%&)&

&

Page 25: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

25

What does a structured component add? •  Supplements the scientific discourse!

o  natural language has a degree of ambiguity!•  Brings clarity in reporting research methods and procedures!

o  no trimming, no cooking!o  clear samples to data files links and relation to methods!

•  Provides the basis for search and discovery features!

SciData DD

Structured content SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

SciData DD

Structured content

Same tissue

Same organism

Same assay

Community Data

Repositories

Page 26: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Citation of and link to data files and databases

Page 27: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Res

earc

h pa

pers

D

ata

reco

rds

Dat

a D

escr

ipto

rs

We currently recognize over 60 public data repositories!!

Helping the authors to find the right place for the data

Page 28: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Big  data  |  CSE  2014  28  

Repositories criteria!

1.  Broad support and recognition within their scientific community !

2.  Ensure long-term persistence and preservation of datasets!

3.  Provide expert curation !

4.  Implement relevant, community-endorsed reporting requirements !

5.  Provide for confidential review of submitted datasets !

6.  Provide stable identifiers for submitted datasets !

7.  Allow public access to data without unnecessary restrictions !

Page 29: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

~ 156

~ 70

~ 334

Source: BioPortal

Databases !implementing !

standards!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

ARRIVE!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Progressively refine guidance to authors and reviewers

Page 30: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Mapping the landscape of standards and databases

Page 31: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Nature 515, 312 (20 November 2014) doi:10.1038/515312a http://www.nature.com/news/data-access-practices-strengthened-1.16370

Key part of NPG data access & reproducible research policies

Page 32: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Responsibilities lie across several stakeholder groups

Understand the benefits of sharing FAIR datasets and enact them

Engage and assist researchers to enable them to share FAIR datasets

Release or endorse practices and polices, but also incentive

and credit mechanisms for researchers, curators and

developers

Page 33: NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science

Acknowledgements!

Visit nature.com/scientificdata

Email [email protected]

Tweet @ScientificData

Honorary Academic Editor Susanna-Assunta Sansone, PhD

Managing Editor Andrew L Hufton, PhD Editorial Curator Varsha Khodiyar

Publisher Iain Hrynaszkiewicz Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators

and our Advisory Boards and Collaborators

Funds: Philippe Rocca-Serra, PhD Senior Research Lecturer

Alejandra Gonzalez-Beltran, PhD Research Lecturer

Eamonn Maguire, Dphil Contractor

Milo Thurston, PhD Senior Bioinfomatician

Allyson Lister, PhD Knowledge Engineer

Alfie Abdul-Rahman, PhD Research Software Engineer