Bringing visibility to food security data results: an experiment in Research Data Alliance (RDA) PID tools Quan (Gabriel) Zhou, Inna Kouper and Beth Plale Indiana University Jason Haga AIST, Japan Venice Juanillas and Ramil Mauleon InternaLonal Rice Research InsLtute NaLonal Data Service Workshop, Oct 2016
25
Embed
Bringing visibility to food security data results: an ... · Bringing visibility to food security data results: an experiment in Research Data Alliance (RDA) PID tools Quan ... Handle
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
size checksum version part-of has-parts replica locations ...
size checksum timestamps version predecessor successor ...
Climate sciences Material sciences
. . .
Core profile
Our Data Identity Services } Supports persistent ID assignment and
registration of data objects generated by scientific analysis that is carried out from scientific experiments such as workflows. The data service leverages both RDA DTR and PIT
} API resources } Create DO PID with PID metadata profile
(PIT model is applied) } Resolve DO PID with metadata profile as
human readable format } Get/Set resource links (landing page,
metadata URL) } Get Data Type Definition using community
profile PID (interaction with PIT service) } Get full inter-identifier links } Lightweight database to keep track of all
registered PIDs of DOs
Page 10
DataIdentity Portal
DataIdentity Client
Data Service
Data-Identity Server
RDA DTR
RDA PIT
Request
Response Response
DataIdentity Client
Data Identity Client for Galaxy
} Data Identity Client added into Galaxy Tassel5 Workflow to harvest workflow data objects } Minimum instrumentation - Interact with Tassel5 pipeline script
without touching Tassel core code base } User transparency - Automatically harvest DOs when workflow is
executed from Galaxy engine } Plug & play model – With minor updates to client this framework
can be used to harvest DO from applications across domains
Tassel5 Core
Page 11
Tassel Pipeline Galaxy Tassel Compute Tool
Input Output Workflow DO
Galaxy Workflow Platform
Our Data Repository } Implement repository with
replicated instance of MongoDB } Single framework to store both
metadata and data. Offer users the possibility to decide the information they want to have as data objects metadata
} Implemented as separate databases: Staging DB and permanent Repo DB } CRUD operation support for DOs
in staging DB } DOs in permanent Repo DB only
support READ; UPDATE and DELETE not allowed
Page 12
Clients
Portal Shell CLI
Data Repository Service Interface
Rest API
Data Repository Storage Access
MongoDB DataBases
Handle Service } For this experiment we utilized a Handle server (V8)
residing at CNRI in Virginia } Handle instance configurations: