Top Banner
The Onedat a pla orm Konr ad Zemek, Krzysztof T rz epla ACC Cyfronet AGH {konrad.zemek,krzysztof.trzepla}@cyfronet.pl e-Research Summer Hackfest RIA-653549
25

Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Jun 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

The Onedata platform

Konrad Zemek, Krzysztof Trzepla

ACC Cyfronet AGH

{konrad.zemek,krzysztof.trzepla}@cyfronet.pl

e-Research Summer Hackfest

RIA-653549

Page 2: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Agenda

● Introduction to Onedata

● Internal Architecture

● Live Demo:

– Example scenarios for distributed data access

– Sharing

– FUSE client

– CDMI & REST Access

● Onedata in HBP

● Open Data Platform

● Hands-on demo

● Summary

INDIGO-DataCloud RIA-653549

Page 3: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Introduction to Onedata

Integrating distributed data infrastructures with INDIGO-DataCloud 3

Page 4: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Problems

● Heterogenity of storage technologies

● High-throughput processing

● Data in large scale multi-cloud environments

● High-throughput transfers

● Replica management

● Sharing:

– Team-sharing

– Cross-community data sharing

– Instant and ad-hoc data sharing

Integrating distributed data infrastructures with INDIGO-DataCloud 4

Page 5: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Problems

● Heterogenity of storage technologies

● High-throughput processing

● Data in large scale multi-cloud environments

● High-throughput transfers

● Replica management

● Sharing:

– Team-sharing

– Cross-community data sharing

– Instant and ad-hoc data sharing

Integrating distributed data infrastructures with INDIGO-DataCloud 5

Page 6: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Onedata team

● Currently the Onedata team is composed of 20 people

● We have been developing the platform for > 3 years

● Located in Krakow, Poland

● Supported by:

– ACC Cyfronet AGH

– PLGrid

– INDGO Data Cloud

– EGI Engage

Integrating distributed data infrastructures with INDIGO-DataCloud 6

Page 7: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

The big picture

Integrating distributed data infrastructures with INDIGO-DataCloud 7

Page 8: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Onedata spaces

8

User 1 User 2 User 3 Group

CommunityMetadata Index

METADATA CHANGE FEED METADATA CHANGE FEED

P2P P2P

ONEDATAProvider

ONEDATAProvider

ONEDATAProvider

ONEDATAProvider

Storage LustreFS

CephFS Amazon S3

Each Space might be Supported by many providers

Local Network Attached Storage

Indigo Cloud Provider 2Indigo Cloud Provider 1

Direct Access Direct Access

Page 9: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Onedata system architure

Integrating distributed data infrastructures with INDIGO-DataCloud 9

FUSE Client Oneclient FUSE Client Oneclient

HTTP GUI REST

HTTP GUI REST

FUSE Client FUSE Client

FUSE Client FUSE Client

HTTP GUI REST

HTTP GUI REST

FUSE Client FUSE Client

OnezoneOnezone

Page 10: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Oneworld

Integrating distributed data infrastructures with INDIGO-DataCloud 10

Page 11: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Internal architecture

Integrating distributed data infrastructures with INDIGO-DataCloud 11

Page 12: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Onedata system architecture

Integrating distributed data infrastructures with INDIGO-DataCloud 12

FUSE Client Oneclient FUSE Client Oneclient

HTTP GUI REST

HTTP GUI REST

FUSE Client FUSE Client

FUSE Client FUSE Client

HTTP GUI REST

HTTP GUI REST

FUSE Client FUSE Client

OnezoneOnezone

POSIXPOSIX CephCeph S3S3 Swift(testing)Swift(testing)

CDMICDMI

WebDAV(in prep.)

WebDAV(in prep.)

POSIXPOSIXOIDCOIDC

SAML(in prep.)SAML

(in prep.)EntryGUI

EntryGUI

Data Mgmt.GUI

Data Mgmt.GUIRESTAPIsRESTAPIs

RESTAPIsRESTAPIs

KademilaDHT

KademilaDHT

FTP / SFTP (in prep.)

FTP / SFTP (in prep.)

OAI-PMH(in. prep.)

OAI-PMH(in. prep.)

Page 13: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

What’s new in Onedata 3.0

● Internal architecture of Onedata 2.x redesigned from scratch

● Access tokens based on macaroons

● Support for POSIX, S3, Ceph, Swift storages

● Provides CDMI, POSIX, REST access to the data

● Support for Zones

● Internal Database migrated to Couchbase

● Fully dockerized

● Batch configuration and depolyment

● Many tests at several levels: unit, integration, acceptance, performance, stress

Integrating distributed data infrastructures with INDIGO-DataCloud 13

Page 14: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Scalability and fault tolerance

Integrating distributed data infrastructures with INDIGO-DataCloud 14

Ceph

:443, :53

FirewallProtocols CDMIProtocols S3

Protocols POSIXVFS

Parallel Processing Nodes using POSIXoneclient, CDMI or REST

Storage Access

Control,Remote Data Access

CDMI API

S3 NFS Lustre

Page 15: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Remote file transfers

Integrating distributed data infrastructures with INDIGO-DataCloud 15

Distributed Priority QueueFor cluster to cluster transfers

WAN

Transfer started by:• User in GUI• API-s• Policy• Access to Remote Data

Block-based transfer: • Remote Data Access on the fly• Pre-staging• Data Migration• Data Replication

Page 16: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

CDMI-supported capabilities

Integrating distributed data infrastructures with INDIGO-DataCloud 16

Operations Capabilities

Basic object GET PUT DELETE cdmi_dataobjects, cdmi_read_value, cdmi_modify_value, cdmi_delete_dataobject

Basic container GET PUT DELETE cdmi_list_children, cdmi_create_container, cdmi_delete_container

Metadata (container&dataobject) cdmi_read_metadata, cdmi_modify_metadata, cdmi_size, cdmi_(atime|mtime|ctime)

Access control lists (rwx) cdmi_acl

Big folders cdmi_list_children_range

File System Export (FUSE client) -

Move and copy cdmi_(move|copy)_(container|dataobject)

Big files cdmi_read_value_range, cdmi_modify_value_range

Access by ObjectID cdmi_object_access_by_ID

Page 17: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Live demo

Integrating distributed data infrastructures with INDIGO-DataCloud 17

Page 18: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Demo environment

Integrating distributed data infrastructures with INDIGO-DataCloud 18

Docker Onezone

Docker Onezone

VM onezone

Docker Oneclient

Docker Oneclient

DockerDocker

NFS Server

INFN VM

VM nfs

VM oneclient

DockerDocker

Catania VM

Docker Oneclient

Docker Oneclient

Page 19: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Onedata in HBP

Integrating distributed data infrastructures with INDIGO-DataCloud 19

Page 20: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

HBP image service with dockerized client

Integrating distributed data infrastructures with INDIGO-DataCloud 20

HBP ScansSpace 5TB

Oneclient

HBP Image Viewer/srv/data HBPScans/

HBP Atlas Viewer

Page 21: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

High-performance access

Integrating distributed data infrastructures with INDIGO-DataCloud 21

Page 22: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Open data platform

Integrating distributed data infrastructures with INDIGO-DataCloud 22

Page 23: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Open data platform

Integrating distributed data infrastructures with INDIGO-DataCloud 23

PrivateResources

Data–set-1

SnapshotData-set-1.1

Data-set-1.1

Mounted to/localdir/

ClonedData-set-1.1

PrivateResources

4: VisitCollection Web Page(HTTP)

6: opendata fork DOI.1

3: discoverdata-> DOI.1

5: opendata mount remoteDOI.1 /localdir/

1: opendata create snapshot Data-set-1

Public Services For DataDiscovery

2: opendata publish collection Data-set-1.1 -> DOI.1

Lazy Replication

3: disc

over

data-> DO

I.1

Page 24: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Hands-on demo

Integrating distributed data infrastructures with INDIGO-DataCloud 24

https://tinyurl.com/onedataHackfestDemo

Page 25: Agenda Catania [Home] - T O ata platform · 2016-07-05 · Agenda Introduction to Onedata Internal Architecture Live Demo: – Example scenarios for distributed data access – Sharing

Summary

● Distributed multi-provider storage ● Flexible access control● Inter-federations scenarios for sharing data● POSIX client for mounting user’s space● Scalable from Single NAS to large datacentre● Can be deployed on top of high-performance parallel storage solutions with very

small overhead < 5%.● Support for open data scenarios● Onedata is currently supported by: PLGrid, EGI-Engage, INDIGO DataCloud

INDIGO-DataCloud RIA-653549