Supported by the NIH grant 1U24AI11796601 to UCSD PI , CoInvestigators at: Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Philippe Rocca-Serra Oxford e-Research Centre, University of Oxford, UK Smart Descriptions & Smarter Vocabularies (SDSVoc) 30 November-1 December 2016, CWI Amsterdam Science Park The model: dataset descriptions for data discovery in DataMed
10
Embed
The DATS model: datasets descriptions for data discovery in DataMed
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Supported by the NIH grant 1U24 AI117966-01 to UCSDPI , Co-Investigators at:
Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Philippe Rocca-Serra
Oxford e-Research Centre, University of Oxford, UK
v Enabling discoverability: find and access datasets available in multiple
repositories
v Focusing on surfacing key metadata descriptors, such as
² information and relations between datasets, creators, publication,
funding sources, nature of biological signal and perturbation etc.
v Not the perfect model to represent the experimental details
² the level of detail and metadata needed to ensure interoperability
and reusability are left to the indexed databases
² We have aimed to have maximum coverage of use cases with minimalnumber of data elements and relations
² Only very few properties are required
² Follow Best Practices for Data on the Web
What is ‘ remit?What is ‘ remit?
Metadata elements identified by combining the two complementary approaches
USE CASES: top-down approach SCHEMAS: bottom-up approach
The development process in a nutshellThe development process in a nutshell
Model serialized as JSON schemas and mapping to schema.org(v1.0, v1.1, v2.0, v2.1)
Using an existing model?Using an existing model?
v schema.orgv DataCitev RIF-CS (Registry Interchange Format – Collection and Services)v W3C HCLS dataset descriptions (mapping of many models including Dublin Core,
DCAT, PROV, VoID, VoID-ext)v Project Open Metadata (used by HealthData.gov)
v ISA (Investigation/Study/Assay)v BioProjectv BioSample
v MiNIMLv PRIDE-mlv MAGE-tabv GA4GH metadata schemav SRA xmlv CDISC SDM / element of BRIDGE model
Generic Models
Life Science / BioMedicalModels
Considered multiple models, mapped/analyze these ones:
bottom-up approach
Convergence of elements extracted from competencyquestionsand existing (generic and biomedical)
model for scalable indexingmodel for scalable indexing
Adoption
of elements extracted from
and from
core entities
extended entities
plus elements from other models (e.g.
dataset/distribution/catalog from DCAT)
Serializations and use of schema.orgSerializations and use of schema.org
v DATS model represented as JSON schemas, instances as:² JSON* format, and ² JSON-LD** with vocabulary from schema.org
² serializations in other formats and with other vocabularies can also be done, as / if needed
v Benefits for DataMed and databases indexed by DataMedv Increased visibility (by both popular search engines), accessibility (via common query interfaces) and possibly improve ranking
v Use and extensions of schema.org² Submitted to their tracker missing DATS elements² Coordinating via the bioschemas.org initiative (ELIXIR is also part of) the extension of schema.org for life science
* JavaScript Object Notation** JavaScript Object Notation for Linked Data
Other adopters exporting
DATS in their APIs
To evaluate DATS model capabilities
Work in progress:documentation and curation guidelines for
adopters
Implementations and documentation Implementations and documentation
Thanks!bioCADDIE Working Groups https://biocaddie.org/group/working-groups