Top Banner
Felipe O. Gutierrez AMC - Academic Medical Center - Amsterdam, Netherlands A.C.Camargo Cancer Center - São Paulo, Brazil FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira Patrão A.H.C. van Kampen J.T. van den Berg Aldo Jongejan Sjoerd Repping
18

FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Mar 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Felipe O. GutierrezAMC - Academic Medical Center - Amsterdam, Netherlands

A.C.Camargo Cancer Center - São Paulo, Brazil

FAIR Sequencing Data Repository based on iRODS

Silvia D. Olabarriaga

F. Oliveira Gutierrez

P.F.G. De Geest

Diogo Ferreira Patrão

A.H.C. van Kampen

J.T. van den Berg

Aldo Jongejan

Sjoerd Repping

Page 2: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Problem● Inadequate RDM (Research Data Management) solution for NGS data (Next

Generation Sequencing):○ Individual storage and backup○ Dispersed datasets○ Disconnected from metadata○ Not FAIR

2

Page 3: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

ConsiderationsFit within organization

● ICT culture● Research culture● Sustainability vision

Adhere to international community best practices

Reuse and extend existing solutions

3

Freeman, 1983

Page 4: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Fit into AMC Vision for RDM Based on NFU Data4Lifesciences WP2

4

An NGS repository that is:

● Part of an ecosystem● Controlled by AMC● Distributed● Scalable● FAIR compliant● Easy to use

Page 5: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

System Design● iRODS 4.1.10

○ Middleware○ Data virtualization

● Virtuoso 7.2○ Triplestore○ Supports ontologies

● User interfaces:○ Metalnx web○ Davrods 4.1○ iCommands

5

Page 6: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

System Architecture

6

Page 7: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Stewardship: Ontologies

● EDAMOntology for bioinformatics operations, types of data, data identifiers, data formats, and topics

● OMIABISOntologized Minimum Information About Biobank data Sharing (MIABIS)

● OBIOntology for Biomedical Investigations

● EFOExperimental Factor Ontology

7

Page 8: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Workflow: Data Ingestion

8

Page 9: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Workflow: (meta)data Registration

9

Page 10: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Workflow: (meta)data Retrieval

10

Page 11: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Access and Security

11

Page 12: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

12

Status

Page 13: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Report file

13

Page 14: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

nmon read KB/s

14

Page 15: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

nmon write KB/s

15

Page 16: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

nmon IOPs

16

Page 17: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Qualitative & Quantitative questions● (meta)data preparation? Clear, doable, easy, ...● (meta)data upload? Type, size, quantity, integrity, ...● Rule processing? Report file clear and easy, system delay feedback, ...● (meta)data retrieval? Findable, Accessible, Organized, Interoperable,

Reusable, ..● Concurrent users, variation on the number and size of files.

17

Page 18: FAIR Sequencing Data Repository based on iRODS · 2020-03-03 · FAIR Sequencing Data Repository based on iRODS Silvia D. Olabarriaga F. Oliveira Gutierrez P.F.G. De Geest Diogo Ferreira

Acknowledgements

KEBB:•Barbera van Schaik•Allard van Altena

ADICT: Hans van den BergUvA ICTS: Joyce Nijkamp

Medical Library: Lieuwe KoolClinical Research Unit: Rudy Scholte

Reproductive medicine: Sjoerd ReppingGenetic Metabolic Diseases: Frédéric VazImmunogenomics: Niek de Vries