YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Research data management as a service

Ian Foster [email protected]

Page 2: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

High energy physics

Molecular biology

Cosmology

Genetics

Metagenomics

Linguistics

Economics

Climate change

Visual arts

Page 3: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

What would a “dropbox for science”

look like?

Page 4: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Registry  Staging  Store  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Registry  

Quota exceeded

!

Expired credentials

!

Network failed. Retry.

!

Permission denied

!

It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging

Page 5: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

BIG  DATA  …for

Page 6: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

• Collect  • Move  • Sync  • Share     Capabili8es  delivered  using    

So=ware-­‐as-­‐Service  (SaaS)  model  

Page 7: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Page 8: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Data Source

Data Destination

User  iniAates  transfer  request  

1

Globus  Online  moves/syncs  files  

2

Globus  Online  noAfies  user  

3

Page 9: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Data Source

User  A  selects  file(s)  to  share;  selects  user/group,  sets  share  permissions    

1

Globus  Online  tracks  shared  files;  no  need  to  move  files  to  cloud  storage!  

2

User  B  logs  in  to  Globus  Online  and  accesses  shared  file  

3

Page 10: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

Page 11: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

8,000  registered  users;  >100  daily  ~16  PB  moved;  ~1B  files  

10x  (or  beOer)  performance  vs.  scp  99.9%  availability  

En8rely  hosted  on  Amazon  

Page 12: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Globus  Online  already  does  a  lot  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 13: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 14: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Dataset Services

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 15: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Expanding Globus Online services

•  Ingest and publication –  Imagine a DropBox that not only replicates, but

also extracts metadata, catalogs, converts •  Cataloging

– Virtual views of data based on user-defined and/or automatically extracted metadata

•  Computation – Associate computational procedures,

orchestrate application, catalog results, record provenance

Page 16: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Builds on catalog as a service Approach

•  Hosted user-defined catalogs

•  Based on tag model <subject, name, value>

•  Optional schema constraints

•  Integrated with other Globus services

Three REST APIs /query/ •  Retrieve subjects /tags/ •  Create, delete, retrieve

tags /tagdef/ •  Create, delete, retrieve

tag definitions Builds  on  USC  Tagfiler  project  (C.  Kesselman  et  al.)  

Page 17: Research Data Management as a Service

17  

mydata42  

owner:  Francesco  type:  3dtomo  format:  HDF5  beamline:  2BM  

Tomography!

Define  dataset  Infer  type  Extract  metadata  

Populate  catalog(s)  

Locate  datasets  Access  files  

analyze  

Catalog  derived  products  

transfer/schedule  

Orchestra8on  Organiza8on  

Record    provenance    

Annotate,  share  browse,  search  

Page 18: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

Page 19: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Globus Online Provider Plans

Support ongoing operations

Offer value-added capabilities

Engage more closely with users

Page 20: Research Data Management as a Service

computationinstitute.org www.globusonline.org    Starting at $20k per year

•  Provider endpoints with sharing •  Multiple GridFTP servers per endpoint •  Branded web sites •  Alternate identity provider •  Usage reporting •  MSS optimizations •  Operations monitoring and management •  Input into and access to product roadmap

Provider Plans offer…

Page 21: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Thanks to great colleagues and collaborators

•  Steve Tuecke, Rachana Ananthakrishnan, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, and many others at Argonne & Uchicago

•  Carl Kesselman, Karl Czajkowski, Rob Schuler, and others at USC/ISI

•  Birali Runesha and others at UChicago Research Computing Center

Page 22: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Thank  you  to  our  sponsors!  


Related Documents