Metadata challenges of reproducible research and re-usable data BioSharing, ISA and STATO examples Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of Oxford [email protected]@alegonbel OpenData & Reproducibility workshop: the Good Scientist in the Open Science era 21st April 2015 British Ecological Society, UK
30
Embed
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Metadata challenges of reproducible research and re-usable data
BioSharing, ISA and STATO examples
Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of [email protected] @alegonbel
OpenData & Reproducibility workshop: the Good Scientist in the Open Science era
A community mobilization to develop standards, e.g.:
! Structural and operational differences • organization types (open, close to members, society, WG etc.) • standards development (how to formulate, conduct and maintain) • adoption, uptake, outreach (link to journals, funders and commercial sector) • funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots groups
standard organizations
Nanotechnology Working Group
Types of reporting standards
Nanotechnology Working Group
Including minimum information reporting requirements, or checklists to report the same core, essential information
Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’
Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another
A web-based, curated and searchable registry ensuring that standards and databases are registered, informative and discoverable; also
monitoring the development and evolution of standards, their use in databases and the adoption of both in data policies.
Launched Jan 2011
Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;
Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented
Goal: assist stakeholders to make informed decisions
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Core functionalities: • search and filtering, e.g. by
funder • submissions forms to add
new records • “claim” functionality of
existing records • person’s profile (as
maintainer of records) associated to the ORCID profile (for credit, as incentive)
• visualization and views of content
Search, filter, submit, claim, view and more
Curated crowdsourcing approach
Formats & Database Fragmentation
14
) infrastructureThe Investigation/Study/Assay (
generic format for experimental description and data exchange
open source software toolscommunity engagement
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
• stem cell discovery• system biology• transcriptomics• toxicogenomics• communities
working to build a library of cellular signatures
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
• Coverage for processes (e.g. statistical tests and their condition of application) and information needed or resulting from statistical methods (e.g. probability distributions, variable, spread and variation metrics)
• STATO also benefits from: (i) extensive documentation with the provision of textual and formal definitions; (ii) an associated R code snippets using the dedicated R-command metadata tag, aiming at facilitating teaching and learning while relying of the popular R language; (iii) query examples documentation, highlighting how the ontology can be harnessed for reviewers/tutors/student alike.
Developed in collaboration with Dr Burke, Senior Statistician, Nuffield Department of Population Health, University of Oxford