FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Post on 13-Mar-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Felipe O. GutierrezAMC - Academic Medical Center - Amsterdam, Netherlands

A.C.Camargo Cancer Center - São Paulo, Brazil

FAIR Sequencing Data Repository based on iRODS

Silvia D. Olabarriaga

F. Oliveira Gutierrez

P.F.G. De Geest

Diogo Ferreira Patrão

A.H.C. van Kampen

J.T. van den Berg

Aldo Jongejan

Sjoerd Repping

Problem● Inadequate RDM (Research Data Management) solution for NGS data (Next

Generation Sequencing):○ Individual storage and backup○ Dispersed datasets○ Disconnected from metadata○ Not FAIR

2

ConsiderationsFit within organization

● ICT culture● Research culture● Sustainability vision

Adhere to international community best practices

Reuse and extend existing solutions

3

Freeman, 1983

Fit into AMC Vision for RDM Based on NFU Data4Lifesciences WP2

4

An NGS repository that is:

● Part of an ecosystem● Controlled by AMC● Distributed● Scalable● FAIR compliant● Easy to use

System Design● iRODS 4.1.10

○ Middleware○ Data virtualization

● Virtuoso 7.2○ Triplestore○ Supports ontologies

● User interfaces:○ Metalnx web○ Davrods 4.1○ iCommands

5

System Architecture

6

Stewardship: Ontologies

● EDAMOntology for bioinformatics operations, types of data, data identifiers, data formats, and topics

● OMIABISOntologized Minimum Information About Biobank data Sharing (MIABIS)

● OBIOntology for Biomedical Investigations

● EFOExperimental Factor Ontology

7

Workflow: Data Ingestion

8

Workflow: (meta)data Registration

9

Workflow: (meta)data Retrieval

10

Access and Security

11

12

Status

Report file

13

nmon read KB/s

14

nmon write KB/s

15

nmon IOPs

16

Qualitative & Quantitative questions● (meta)data preparation? Clear, doable, easy, ...● (meta)data upload? Type, size, quantity, integrity, ...● Rule processing? Report file clear and easy, system delay feedback, ...● (meta)data retrieval? Findable, Accessible, Organized, Interoperable,

Reusable, ..● Concurrent users, variation on the number and size of files.

17

Acknowledgements

KEBB:•Barbera van Schaik•Allard van Altena

ADICT: Hans van den BergUvA ICTS: Joyce Nijkamp

Medical Library: Lieuwe KoolClinical Research Unit: Rudy Scholte

Reproductive medicine: Sjoerd ReppingGenetic Metabolic Diseases: Frédéric VazImmunogenomics: Niek de Vries

top related