Datasets with Bioschemas Alejandra Gonzalez-Beltran(*), Philippe Rocca-Serra(*), Susanna Sansone(*) and the bioCADDIE Team and Commmunity (*) Oxford e-Research Centre, University of Oxford Bioschemas community meeting, Harpenden,Hertfordshire, UK 8th-9th November 2016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Datasets with Bioschemas
Alejandra Gonzalez-Beltran(*), Philippe Rocca-Serra(*), Susanna Sansone(*) and the bioCADDIE Team and Commmunity
(*) Oxford e-Research Centre, University of Oxford
Bioschemas community meeting, Harpenden,Hertfordshire, UK
8th-9th November 2016
The problem
how to describe scientific(*) datasets to enable data discovery
(*) considering in particular
biological and biomedical datasets
Design principles
The model for data description to be designed around the Dataset entity, i.e. a unit of information stored by a data repository:
● Archived experimental datasets which do not change after deposition to the repository; e.g. dbGAP, GEO, ClinicalTrials.org
● Datasets in reference knowledge bases, describing dynamic concepts, such as “genes” whose definition morphs over time; e.g. UniProt
Additionally:
● A dataset and related datasets may available in multiple repositories
● A dataset may be available in multiple forms
Best practices for data on the webhttps://www.w3.org/TR/dwbp
Mapping dats to schema.org ✧ Missing elements (needed by DATS) submitted to
the tracker; Roughly 80 % of DATS entities and properties can be mapped but alignment is not perfect/less precise), the remaining 20% constitute major gaps
✧
✧ Tracking schema.org and its related Health and Life Science extension evolution (the latter focuses on clinical studies)