computationinstitute.org www.globusonline.org Research data management as a service Ian Foster [email protected]
May 10, 2015
computationinstitute.org www.globusonline.org
Research data management as a service
Ian Foster [email protected]
computationinstitute.org www.globusonline.org
High energy physics
Molecular biology
Cosmology
Genetics
Metagenomics
Linguistics
Economics
Climate change
Visual arts
computationinstitute.org www.globusonline.org
What would a “dropbox for science”
look like?
computationinstitute.org www.globusonline.org
Registry Staging Store
Ingest Store
Analysis Store
Community Store
Archive Mirror
Ingest Store
Analysis Store
Community Store
Archive Mirror
Registry
Quota exceeded
!
Expired credentials
!
Network failed. Retry.
!
Permission denied
!
It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging
computationinstitute.org www.globusonline.org
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
BIG DATA …for
computationinstitute.org www.globusonline.org
• Collect • Move • Sync • Share • Analyze
• Annotate • Publish • Search • Backup • Archive
• Collect • Move • Sync • Share Capabili8es delivered using
So=ware-‐as-‐Service (SaaS) model
computationinstitute.org www.globusonline.org
computationinstitute.org www.globusonline.org
Data Source
Data Destination
User iniAates transfer request
1
Globus Online moves/syncs files
2
Globus Online noAfies user
3
computationinstitute.org www.globusonline.org
Data Source
User A selects file(s) to share; selects user/group, sets share permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online and accesses shared file
3
computationinstitute.org www.globusonline.org
Early adopAon is encouraging
computationinstitute.org www.globusonline.org
Early adopAon is encouraging
8,000 registered users; >100 daily ~16 PB moved; ~1B files
10x (or beOer) performance vs. scp 99.9% availability
En8rely hosted on Amazon
computationinstitute.org www.globusonline.org
Globus Online already does a lot
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
We are also adding capabiliAes
Globus Toolkit
Sharing Service
Transfer Service
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
We are also adding capabiliAes
Globus Toolkit
Sharing Service
Transfer Service
Dataset Services
Globus Nexus (Identity, Group, Profile)
Glo
bu
s O
nlin
e A
PIs
Glo
bu
s C
on
nec
t
computationinstitute.org www.globusonline.org
Expanding Globus Online services
• Ingest and publication – Imagine a DropBox that not only replicates, but
also extracts metadata, catalogs, converts • Cataloging
– Virtual views of data based on user-defined and/or automatically extracted metadata
• Computation – Associate computational procedures,
orchestrate application, catalog results, record provenance
computationinstitute.org www.globusonline.org
Builds on catalog as a service Approach
• Hosted user-defined catalogs
• Based on tag model <subject, name, value>
• Optional schema constraints
• Integrated with other Globus services
Three REST APIs /query/ • Retrieve subjects /tags/ • Create, delete, retrieve
tags /tagdef/ • Create, delete, retrieve
tag definitions Builds on USC Tagfiler project (C. Kesselman et al.)
17
mydata42
owner: Francesco type: 3dtomo format: HDF5 beamline: 2BM
Tomography!
Define dataset Infer type Extract metadata
Populate catalog(s)
Locate datasets Access files
analyze
Catalog derived products
transfer/schedule
Orchestra8on Organiza8on
Record provenance
Annotate, share browse, search
computationinstitute.org www.globusonline.org
Our challenge:
Sustainability
We are a non-profit service provider to the non-profit
research community
computationinstitute.org www.globusonline.org
Globus Online Provider Plans
Support ongoing operations
Offer value-added capabilities
Engage more closely with users
computationinstitute.org www.globusonline.org Starting at $20k per year
• Provider endpoints with sharing • Multiple GridFTP servers per endpoint • Branded web sites • Alternate identity provider • Usage reporting • MSS optimizations • Operations monitoring and management • Input into and access to product roadmap
Provider Plans offer…
computationinstitute.org www.globusonline.org
Thanks to great colleagues and collaborators
• Steve Tuecke, Rachana Ananthakrishnan, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, and many others at Argonne & Uchicago
• Carl Kesselman, Karl Czajkowski, Rob Schuler, and others at USC/ISI
• Birali Runesha and others at UChicago Research Computing Center
computationinstitute.org www.globusonline.org
Thank you to our sponsors!