FAIR and metadata standards - FAIRsharing and Neuroscience

Post on 21-Jan-2018

239 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

Transcript

FAIR digital research assets: beyond the acronym

Susanna-Assunta Sansone, PhD@SusannaASansone

ORCiD 0000-0001-5306-5690

Consultant,Founding Academic Editor

Associate Director,Principal Investigator

Neuroinformatics,KualaLumpur,20-21August,2017

• Available in a public repository

• Findable through some sort of search facility

• Retrievable in a standard format

• Self-described so that third parties can make sense of it

• Intended to outlive the experiment for which they were collected

To do better science, more efficiently we need data that are…

A set of principles, for those

wishing to enhance

the value of their

data holdings

Wider adoption of the FAIR principles, by research infrastructure programmes, e.g.

Defining FAIRness

Defining a framework for evaluating FAIRness

By the

fairmetrics.org

Working Group

NOTE: The Principles are high-level; do not suggest any specific

technology, standard, or implementation-solution

Principles put emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals

Interoperability standards – the pillars of FAIR

The invisible machinery

• Identifiers and metadata to be implemented by technical experts in tools, registries, catalogues, databases, services

• It is essential to make standards ‘invisible’ to lay users, who often have little or no familiarity with them

http://nometadata.org/logo

Metadata standards – fundamentals

• Descriptors for a digital object that help to understand what it is, where to find it, how to access it etc.

• The type of metadata depends also on the type of digital object (e.g. software, dataset)

• The depth and breadth of metadata varies according to their purpose§ e.g. reproducibility requires richer metadata then citation

• Domain-level descriptors that are essential for interpretation, verification and reproducibility of datasets

• The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why

Metadata standards - datasets

• Domain-level descriptors that are essential for interpretation, verification and reproducibility of datasets

• The depth and breadth of descriptors vary according to the domain, broadly covering the what, who, when, how and why allowing:§ experimental components (e.g., design, conditions, parameters),§ fundamental biological entities (e.g., samples, genes, cells), § complex concepts (such as bioprocesses, tissues and diseases),§ analytical process and the mathematical models, and § their instantiation in computational simulations (from the molecular

level through to whole populations of individuals)

to be harmonized with respect to structure, format and annotation

Metadata standards - datasets

Metadata for discovery

model and related formats

Metadata for discovery, but not only

…..

Domain-specific metadata standards for datasets

MIAMEMIRIAM

MIQASMIXMIGEN

ARRIVEMIAPE

MIASE

MIQE

MISFISHIE….

REMARK

CONSORT

SRAxml

SOFT FASTADICOM

MzMLSBRML

SEDML…

GELML

ISA

CML

MITAB

AAOCHEBIOBI

PATO ENVOMOD

BTOIDO…

TEDDY

PROXAO

DO

VO

de jurestandard

organizations

de facto

grass-rootsgroups

Formats Terminologies Guidelines

220+

115+

548+

~1000

https://doi.org/10.6084/m9.figshare.3795816.v2

https://doi.org/10.6084/m9.figshare.4055496.v1

• Perspective and focus vary, ranging:§ from standards with a specific biological or clinical domain of study

(e.g. neuroscience) or significance (e.g. model processes)§ to the technology used (e.g. imaging modality)

• Motivation is different, spanning:§ creation of new standards (to fill a gap)§ mapping and harmonization of complementary or contrasting efforts§ extensions and repurposing of existing standards

• Stakeholders are diverse, including those:§ involved in managing, serving, curating, preserving, publishing or

regulating data and/or other digital objects § academia, industry, governmental sectors, and funding agencies§ producers but also also consumers of the standards, as domain (and

not just technical) expertise is a must

A complex landscape

Standards’ life cycle

• Formulation§ use cases, scope, prioritization and expertise

• Development§ iterations, tests, feedback and evaluation§ harmonization of different perspectives and available options

• Maintenance§ (exemplar) implementations, technical documentation, education

material, metrics§ sustainability, evolution (versions) and conversion modules

Technologically-delineated views of the world

Biologically-delineated views of the world

Generic features (‘common core’)- description of source biomaterial- experimental design components

Arrays &Scanning

Columns

GelsMS MS

FTIR

NMR

Columns…

transcriptomics proteomics metabolomics

plant biologyepidemiology neuroscience

Fragmentation, duplications and gaps

Arrays

Scanning …

Arrays

Scanning … Arrays &

Scanning…

Columns

GelsMS MS

FTIR

NMR

Columns…

transcriptomics proteomics metabolomics

Modularization to combine and validate

plant biologyepidemiology neuroscience

Proteomics-based investigations of

neurodegenerative diseases

Proteomics and metabolomics-based investigations of

neurodegenerative diseases

Working in/across multiple domains is challenging

• Requires§ Mapping between/among heterogeneous representations

§ Conceptual modelling framework to encompass the domain specific metadata standards

§ Tools to handle customizable annotation, multiple conversions and validation

Technical and social engineering required

• Pain points include§ Fragmentation§ Coordination, harmonization, extensions§ Credit, incentives for contributors§ Governance, ownership§ Indicators and evaluation methods§ Outreach and engagement with all stakeholders§ Synergies between basic and clinical/medical areas§ Implementations: infrastructures, tools, services§ Education, documentation and training§ Funding streams§ Business models for sustainability

Too many

cooks in the

standards’

kitchen?

Standards

fusion…anyone?

doi: 10.1126/science.1180598

doi:10.1038/nbt1346doi:10.1038/nbt1346

OBO Portal and Foundry Portal and Foundrydoi: 10.1038/nbt.1411

Doing my fair share

• Consumers:§ How do I find the standards appropriate for my case?

• Producers§ How do I make my standards visible to others?

Improving discoverability of standards

Monitorsthedevelopment andevolution ofstandards,

theiruse indatabases andtheadoptionofbothindatapolicies,

toinform andeducate theusercommunity

Standard developing groups, incl:Journal, publishers, incl:

Cross-links, data exchange, incl:

Societies and organisations, incl: Institutional RDM services, incl:

Projects, programmes:

Working with and for producers and consumers

Databases/data repositories

Metadata standards

Formats Terminologies Guidelines

Interlink standards among themselves and with repositories

Data policies by funders, journals and other organizations

Formats Terminologies Guidelines

…and to indicate ‘adoption’

Databases/data repositories

Data policies by funders, journals and other organizations

Metadata standards

270

48232

97

87 4

204

9 6 8

Assign ‘indicators’ to describe their status…

Paper in preparation, preliminary information as of July 2017

Readyforuse,implementation,orrecommendation

Indevelopment

Statusuncertain

Deprecatedassubsumedorsuperseded

Allrecordsaremanuallycurated

in-houseandverifiedbythe

communitybehindeachresource

Help us map the neuroscience standards landscape

Models/Formats Reporting Guidelines Terminology Artifacts

Database Implementations

Journal Recommendations

Models/Formats Reporting Guidelines Terminology Artifacts

Number of standards recommended by 68 journals/publishers policies (the top one)

6 out of 223 (ISA-Tab)

26 out of 118 (MIAME)

8 out of 343 (NCBI Tax)

Paper in preparation, preliminary information as of July 2017

Activating the decision-making chain

Models/Formats Reporting Guidelines Terminology Artifacts

Database Implementations

Journal Recommendations

Models/Formats Reporting Guidelines Terminology Artifacts

Models/Formats Reporting Guidelines Terminology Artifacts

Database Implementations

Journal Recommendations

Models/Formats Reporting Guidelines Terminology Artifacts

Number of standards recommended by 68 journals/publishers policies (the top one)

Number of standards implemented by 544 databases/repositories (the top one)

6 out of 223 (ISA-Tab)

26 out of 118 (MIAME)

8 out of 343 (NCBI Tax)

59 out of 116 (MIAME)

146 out of 223 (FASTA)

121 out of 343 (GO)

Paper in preparation, preliminary information as of July 2017

Activating the decision-making chain

Philippe Rocca-Serra, PhDSenior Research Lecturer

AlejandraGonzalez-Beltran, PhDResearch Lecturer

Milo Thurston, DPhDResearch Software Engineer

MassimilianoIzzo, PhDResearch Software Engineer

Peter McQuilton, PhDKnowledge Engineer

Allyson Lister, PhDKnowledge Engineer

EamonnMaguire, DphilContractor

David Johnson, PhDResearch Software Engineer

MelanieAdekale, PhDBiocurator Contractor

DelphineDauga, PhDBiocurator Contractor

Susanna-Assunta Sansone, PhDPrincipal Investigator, Associate Director

The (long) road to FAIR

Interoperability standards

are digital objects in their own right,

with their associated research, development and educational activities

top related