Top Banner
globus online Science for the Future Strategies for distributing and sharing data www.globusonline.org Ian Foster [email protected]
33

Science for the Future: Strategies for Moving and Sharing Data

May 10, 2015

Download

Technology

Ian Foster

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Science for the Future: Strategies for Moving and Sharing Data

globus online

Science for the Future

Strategies for distributing and sharing data

www.globusonline.org

Ian [email protected]

Page 2: Science for the Future: Strategies for Moving and Sharing Data

Big science data should be easy

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Page 3: Science for the Future: Strategies for Moving and Sharing Data

… but it’s hard and frustrating!

RegistryStaging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Quotaexceeded

!

Expiredcredential

s

!

Networkfailed. Retry.

!

Permissiondenied

!

Page 4: Science for the Future: Strategies for Moving and Sharing Data

Excerpts from ESNet reports• “Transfers often take longer than expected

based on available network capacities”

• “Lack of an easy to use interface to some of the high-performance tools”

• “Tools [are] too difficult to install and use”

• “Time and interruption to other work required to supervise large data transfers”

• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”

Page 5: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… flows rapidly, reliably, and securely among:

experimental facilities, online and archival

storage, computing facilities, and remote institutions

Page 6: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… is easily integrated into dynamic datasets that also include metadata and programs necessary to understand and regenerate it

Page 7: Science for the Future: Strategies for Moving and Sharing Data

We envisage a world where data …

… is readily discoverable and accessible to collaborators, regardless of their and the data’s location

Page 8: Science for the Future: Strategies for Moving and Sharing Data

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

Like … but for science!

Page 9: Science for the Future: Strategies for Moving and Sharing Data

Focusing on “frictionless”, we’ve started to do this with the Globus Online service …

Transfer and sharing of large data sets …

… with dropbox-like characteristics …

… directly from your own storage systems

Page 10: Science for the Future: Strategies for Moving and Sharing Data

We started with reliable, secure, high-performance file transfer …

DataSource

DataDestinatio

n

User initiates transfer request

1

Globus Online moves and syncs files

2

Globus Online notifies user

3

Page 11: Science for the Future: Strategies for Moving and Sharing Data

… and then made it simple to share big data off existing storage systems

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus Online tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus Online and

accesses shared file

3

Page 12: Science for the Future: Strategies for Moving and Sharing Data

Early adoption is encouraging

Page 13: Science for the Future: Strategies for Moving and Sharing Data

Early adoption is encouraging

~18 PB and 1B files moved

10x (or better) performance vs. scp

99.9% availability

Page 14: Science for the Future: Strategies for Moving and Sharing Data
Page 15: Science for the Future: Strategies for Moving and Sharing Data

B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC

Page 16: Science for the Future: Strategies for Moving and Sharing Data

Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience

Page 17: Science for the Future: Strategies for Moving and Sharing Data

Exemplar: APS Beamline 2-BM

X-Ray imaging, tomography, ~few µm to 30nm resolution

Currently can generate >100TB per day

<1GB/s data rate; ~3-5GB/s in 5-10 years

Page 18: Science for the Future: Strategies for Moving and Sharing Data

Transforming data acquisition

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 19: Science for the Future: Strategies for Moving and Sharing Data

Transforming data acquisition

Envisaged• Experimental

parameters optimized automatically

• Collected data available to optimization programs

• Data are automatically reconstructed, reduced, and shared with local and remote participants

• User team leaves the APS with reduced data

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Page 20: Science for the Future: Strategies for Moving and Sharing Data

Facility data acquisition

Globus Online as enabler

Globus Online transfer service

Reduced data

Analysis/SharingGlobus

Online sharing service

Globus Online dataset service*

* In development

Page 21: Science for the Future: Strategies for Moving and Sharing Data

21Credit: Kerstin Kleese-van Dam

Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL

Page 22: Science for the Future: Strategies for Moving and Sharing Data

We believe a new approach is needed to deliver data

management infrastructure

FrictionlessAffordable

Sustainable

Page 23: Science for the Future: Strategies for Moving and Sharing Data

We’ve got a handle on “frictionless”

• Web interface, REST API, command line

• InCommon, Oauth, OpenID, X.509, …

• Credential management

• Group definition and management

• Transfer management and optimization

• Reliability via transfer retries

• Integration with ESNet “Science DMZs”

• One-click “Globus Connect” install

• 5-minute Globus Connect Multi User install

Page 24: Science for the Future: Strategies for Moving and Sharing Data

“Affordable” and “sustainable”?

Common expectation is either:– High-priced commercial software

(with generally higher levels of quality)

Or:– Free, open source software

(with generally lower levels of quality)

We aim to offer the best of all worlds!

Page 25: Science for the Future: Strategies for Moving and Sharing Data

We are a non-profit service provider to the non-profit

research community

Page 26: Science for the Future: Strategies for Moving and Sharing Data

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

Page 27: Science for the Future: Strategies for Moving and Sharing Data

Starting at $20k per year

• Managed endpoints with sharing

• Multiple GridFTP servers per endpoint

• Branded web sites

• Alternate identity provider

• Usage reporting

• Mass storage system (MSS) optimizations

• Operations monitoring and management

• Input into and access to product roadmap

Globus Online Provider Plans

Page 28: Science for the Future: Strategies for Moving and Sharing Data

Provider Plan not required to get started

Use Globus Connect Multiuser to easily connect your resources with Globus

Go to: globusonline.org/gcmu

Registry

Staging Store

IngestStore

AnalysisStore

Community Store

Archive Mirror

IngestStore

AnalysisStore

Community Store

Archive Mirror

Registry

Page 29: Science for the Future: Strategies for Moving and Sharing Data

We hope you will join us

Page 30: Science for the Future: Strategies for Moving and Sharing Data

Providers are also using Globus Online as a platform

Globus Nexus (Identity, Group, Profile)

Sharing Service

Transfer Service

Dataset Services

Globus Toolkit

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

Page 31: Science for the Future: Strategies for Moving and Sharing Data

Early platform adopters

Page 32: Science for the Future: Strategies for Moving and Sharing Data

Our research is supported by:

U.S . DEPARTMENT OF

ENERGY

Page 33: Science for the Future: Strategies for Moving and Sharing Data

Questions

Contact: [email protected]

Providers: globusonline.org/provider-plans

Researchers: globusonline.org/plus

www.globusonline.org