Top Banner
Copyright 2018, The HDF Group November 14, 2018 Birds of a Feather: Enabling Data Services for HPC Jerome Soumagne (The HDF Group) Matthieu Dorier, Philip Carns, Robert Ross (Argonne National Laboratory) Johann Lombardi (Intel Corporation) Chad Woods, Kevin Huck (University of Oregon) Philip Davis, Manish Parashar (Rutgers University)
10

Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

Oct 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

Copyright 2018, The HDF Group

November 14, 2018

Birds of a Feather:Enabling Data Services for HPC

Jerome Soumagne (The HDF Group)Matthieu Dorier, Philip Carns, Robert Ross (Argonne National Laboratory)Johann Lombardi (Intel Corporation)Chad Woods, Kevin Huck (University of Oregon)Philip Davis, Manish Parashar (Rutgers University)

Page 2: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

2Introduction§ Data services become essential to HPC workflow and productivity⁃ Storage services⁃ Data analysis and Visualization services⁃ Telemetry services⁃ etc

SC18 Birds of a Feather: Enabling Data Services for HPC

Simulation

Storage

Visualization

Data Analysis

Telemetry OS / System

November 13, 2018

Page 3: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

3Objective§ Several frameworks already exist and/or are being developed⁃ DAOS / DeltaFS / UnifyCR / Dataspaces / ParaView / Visit / SOS / Faodel

§ Any HPC data service must face similar challenges⁃ Communication between applications⁃ Resilience and fault tolerance⁃ Deployment⁃ Security

§ How can we help service developers/integrators and share knowledge?

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 4: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

4Questions§ How to deploy data services in HPC?⁃ Scheduler integration⁃ Wire-up and bootstrapping⁃ Negotiation of resources with end-user application

§ How to handle resiliency?⁃ How can the application recover from a service fault?

§ How is security provided?⁃ Does it matter or just single user? ⁃ What kind of security do we need?

§ Extras⁃ How to handle communication?⁃ How to earn trust from application users?

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 5: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

5Format§ Series of Talks⁃ Mochi and Mercury – Matthieu Dorier (Argonne National Laboratory) and

Jerome Soumagne (The HDF Group)⁃ DAOS – Johann Lombardi (Intel Corporation)⁃ SOS – Chad Wood (University of Oregon)⁃ Dataspaces – Philip Davis and Manish Parashar (Rutgers University)

§ Discussion and Q&A

§ Material will be posted online⁃ http://mercury-hpc.github.io/news/2018/10/17/data-services-bof.html

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 6: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

6

From Mercury to Mochi

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 7: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

7Original Motivation – HDF5§ Widely used by HPC community§ Provide access to storage systems⁃ VFD (through MPI-IO and ROMIO)⁃ VOL

§ Access HDF5 files remotely oron separate HPC nodes?⁃ Execute HDF5 calls on remote nodes⁃ Remote procedure call

HDF5 API

VOL Layer

VFD Layer

Native

Remote?REST

SEC2

MPI

OHD

FS S3

Storage Cloud

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 8: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

8RPC for HPC§ RPC (Remote Procedure Call)§ Widely used technique to create web services (e.g., gRPC, etc)§ Problem: web services frameworks are not designed for HPC⁃ Typically built around TCP/UDP protocols⁃ Do not handle large data transfers efficiently⁃ Can potentially introduce a lot of jitter (extra threads / memory used / etc)⁃ Must be able to run in userspace

§ Initially developed as part of Exascale FastForward effort w/Intel§ Continued through ASCR Mochi project§ Mercury (http://mercury-hpc.github.io) / 1.0.0 release yesterday!

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 9: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

9Mercury RPC

§ Data service building block⁃ Origin and Target definitions⁃ Input / output arguments = metadata⁃ Large data arguments = bulk data

§ Network abstraction layer⁃ Intranode (SM) / internode (OFI)⁃ Non-blocking callback-based model

Origin Target

Proc Proc

Bulk Data (RDMA)

Metadata (P2P)

Mercury API

Network Abstraction Layer

BMI MPI

sock

ets

verb

sps

m2

gni

SM OFI

shar

ed-

mem

ory

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018

Page 10: Birds of a Feather: Enabling Data Services for HPC€¦ · Philip Davis, Manish Parashar (Rutgers University) Introduction 2 §Data services become essential to HPC workflow and productivity

10Mercury RPC§ Deployment⁃ Portability⁃ Lightweight / small dependencies⁃ Not tied to any threading model⁃ Does not provide any notion of group or collectives⁃ Require out-of-band mechanism to gather peer information

§ Resiliency⁃ Not resilient on its own but provides building blocks like cancelation

§ Security⁃ Can pass authorization keys down to fabric layer (job granularity)

SC18 Birds of a Feather: Enabling Data Services for HPCNovember 13, 2018