dCache: An Overview...dCache: An Overview | | 2019-02-27 | 13 Data Lakes: distributed resources dCache has over a decade of production use as a data lake: NDGF is a distributed dCache,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
dCache: An OverviewPaul Millar
on behalf of the dCache team
Nordic Data Management WorkshopOslo, Norway; 2019-02-27
Scientific data challenges● Volume● Fast ingest● Chaotic Access● Sharing data● Access Control● Persistence & long-term
archival● Immutability
dCache: An Overview | | 2019-02-27 | 3
Fast AnalysisNFS 4.1/pNFS
High SpeedData Ingest
Wide Area Transfers (Globus Online, FTS) by GridFTP, HTTP
Interactive analysis& Sharing
Data management& workflow control(Rucio, Kafka, SSE)
dCache: An Overview | | 2019-02-27 | 4
● HERA
● Tevatron
● WLCG
● Belle II
● LOFAR
● CTA
● IceCUBE
● EU-XFEL
● Petra3
● DUNE
● And many more ...
dCache: An Overview | | 2019-02-27 | 5
Flexibility that works …● Supports many authentication schemes: username+password,
X.509, Kerberos and OpenID-Connect:● Integrates with existing infrastructure + pluggable for flexibility,● Users have same rights, irrespective of how they authenticate.
● dCache is advance storage software for data-intensive science.
● dCache:● has decades of production use throughout the world,● provides scalable resources, used by many scientific disciplines,● offers innovative solutions that help drive the next generation
of scientific discovery.
Backup slides
dCache: An Overview | | 2019-02-27 | 22
dCache 101: Motivation● Data never fits into a single server
● Multiple servers● Off-load to tape
● Growing number of client hosts● Mainframe vs Linux cluster
● Control over hardware/OS selection● Better tender offers● Use and enhance local expertise
dCache: An Overview | | 2019-02-27 | 23
dCache 101: Design
● Single-rooted namespace, distributed data● Client talks to namespace for metadata operations only● Bandwidth and performance grow with number of data
servers● Standard clients (OS native or experiment)● Some data can be offloaded to tape
Processing data without user credentials / BOINC
GET
307
GET
4. Request data directly from dCache
2. Request a macaroon
3. Add caveats
What are macaroons good for?
FTS
What are macaroons good for?
HTTP 3rd party copies
2. Request amacaroon
3. Add caveats
4. COPY with embedded macaroon
5. GET with macaroon
1. Request copy
What are macaroons good for?
Enforcing catalogue permissions
Rucio1. Request accessto data
2. Request a macaroon3. Add caveats
4. Access data
Comparison: it’s what industry is doing…
Comparison: it’s what Open-Source is doing…
dCache Storage Events: Kafka
created Log
billingbilling
stagedLog
billingbilling
created staged
dCache Server-Sent Events (SSE)● Based on HTTP v1.1● HTML 5 standard
Support for many languages and web-browsers
● Initially adding support for inotify events
(it’s how Linux does namespace notification)● Plan to add: