On community-standards, FAIR data and scholarly communication Susanna-Assunta Sansone, PhD ORCID: 0000-0001-5306-5690 INSERM Workshop 246 “Management and reuse of health data: methodological issues”, Bordeaux, 14-17 May 2017 Data Consultant, Founding Academic Editor Associate Director, Principal Investigator www.slideshare.net/SusannaSansone
53
Embed
INSERM - Data Management & Reuse of Health Data - May 2017
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
On community-standards, FAIR data and scholarly communication
INSERM Workshop 246 “Management and reuse of health data: methodological issues”, Bordeaux, 14-17 May 2017
Data Consultant,Founding Academic Editor
Associate Director,Principal Investigator
www.slideshare.net/SusannaSansone
Source: https://www.dataone.org/best-practices
Simplified research data life cycle
• Available in a public repository• Findable through some sort of search facility• Retrievable in a standard format• Self-describing so that third parties can make sense of it• The product of careful planning, organization and stewardship• Intended to outlive the experiment for which they were
collected
To do better science, more efficiently we need data that are…
Key problem: low findability and understandability
• Not always well cited and storedo True for data as well as for any other digital asset
• Poorly described for third party reuseo Different level of details and annotation
• Reporting and annotation activities are perceived as time consumingo Often rushed and minimally done
We need content or reporting standards
• To harmonized the datasets with respect to the structureand level or annotation of their:§ experimental components (e.g., design, conditions, parameters),
§ fundamental biological entities (e.g., samples, genes, cells),
§ complex concepts (such as bioprocesses, tissues, diseases),
§ analytical process and the mathematical models, and
§ their instantiation in computational simulations (from the
molecular level through to whole populations of individuals)
Minimum information reporting requirements, checklists
o Report the same core, essential information
o e.g. MIAME guidelines
Controlled vocabularies, taxonomies, thesauri, ontologies etc.
o Unambiguous identification and definition of concepts
o e.g. Gene Ontology
Conceptual model, schema, exchange formats etc
o Define the structure and interrelation of information, and the transmission format
o e.g. FASTAFormats Terminologies Guidelines
Types of content standards
de jure de factograss-roots
groupsstandard
organizations
Nanotechnology Working Group
Formats Terminologies Guidelines
Community-driven efforts, just few examples
Formats Terminologies Guidelines
224
115
500+
source sourcesource
MIAMEMIRIAM
MIQASMIXMIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
SRAxml
SOFT FASTADICOM
MzMLSBRML
SEDML…
GELML
ISA
CML
MITAB
AAOCHEBIOBI
PATO ENVOMOD
BTOIDO…
TEDDY
PROXAO
DO
VO
Content standards in numbers
How to discover the ‘right’ standards for your data?
Aweb-based,curatedandsearchableportalthat monitorsthedevelopment and
A new open-access, online-only publication for descriptions of scientifically valuable datasets
Supported by
• A peer reviewed description of data, to maximize usage• Citable publications that give credit for reusable data• It requires data deposition to the appropriate repository(s)• Is complementary and can be associated or not to traditional article(s)
New article type
Res
earc
hpa
pers
Dat
a re
cord
sD
ata
Des
crip
tors
Value added component – complementing articles and repositories
• following the Joint Declaration of Data Citation Principles
Detailed description of the methods and technical analyses supporting the
quality of the measurements; no scientific hypotheses
Article structure
Focus on data peer review
• Completeness = can others reproduce?• Consistency = were community standards followed?• Integrity = are data in the best repository?• Experimental rigour, technical quality = were the methods sound?
Does not focus on perceived impact, importance, size, complexity of data
Credit for data producers, data managers/curators etc.
Credit to: Varsha Khodiyar
“The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.”
Professor Daniele Marinazzo
Credit to: Varsha Khodiyar
Data (re)use made easier
Decades old dataset
Aggregated or curated data
resources
Computationally produced data
productsLarge
consortium dataset
Data from a single
experiment
Data that YOU find valuable
and that others might find useful too
Data associated with a high impact
analysis article
What makes a good ?
Experimental metadata or structured component
(in-house curated, machine-readable formats)
Article or narrative component
(PDF and HTML)
Data Descriptors has two components
The Data Curation Editor is responsible for creating and curating the machine-readable structured component• Enables browsing and searching the articles• Facilitates links to related journal articles and repository
records
Curation and discoverability
Created with the input of the authors, includes value-added semantic annotation of the experimental metadata
analysis method script
Data file or record in a database
Data Descriptors: structured component
Complementary roles of ISA and nanopublications
From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. https://doi.org/10.1371/journal.pone.0127612
PloS ONE (2015)
The (long) road to FAIR
Responsibilities lie across several stakeholder groups
Understand the benefits of sharing FAIR datasets and enact them
Engage and assist researchers to enable them to share FAIR datasets
Release or endorse practices and polices, but also incentive
and credit mechanisms for researchers, curators and
developers
“As Data Science culture grows,digital research outputs (such asdata, computational analysis andsoftware) are being established asfirst-class citizens.
This cultural shift is required to goone step further: to recognizeinteroperability standards as digitalobjects in their own right, with theirassociated research, developmentand educational activities”.
Sansone, Susanna-Assunta; Rocca-Serra, Philippe (2016). Interoperability Standards - Digital Objects in Their Own Right. Wellcome Trust” https://dx.doi.org/10.6084/m9.figshare.4055496.v1
Philippe Rocca-Serra, PhDSenior Research Lecturer
AlejandraGonzalez-Beltran, PhDResearch Lecturer
Milo Thurston, DPhDResearch Software Engineer
MassimilianoIzzo, PhDResearch Software Engineer
Peter McQuilton, PhDKnowledge Engineer
Allyson Lister, PhDKnowledge Engineer
EamonnMaguire, DphilContractor
David Johnson, PhDResearch Software Engineer
MelanieAdekale, PhDBiocurator Contractor
DelphineDauga, PhDBiocurator Contractor
We work with and for
to make data and other digital research assets
Susanna-Assunta Sansone, PhDPrincipal Investigator, Associate Director and Data Consultant for Springer Nature
enabling open science, driving science and discoveries