Research Data Management as a Service

Post on 10-May-2015

191 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation is by Ian Foster, director of the Computation Institute at The University of Chicago. It was given at the Great Plains Network Annual Meeting, on May 29, 2013. For more information on Globus Online, visit globusonline.org. "What would a Dropbox for science look like?" asks Foster. "It should be trivial to collect, move, sync, share, analyze, annotate, publish, search, backup, and archive Big Data. But in reality it's often very challenging." Globus Online, a software as a service for data management, solves these problems. This slideshow explains how Globus Online does that for universities and laboratories around the world.

Transcript

computationinstitute.org www.globusonline.org    

Research data management as a service

Ian Foster foster@uchicago.edu

computationinstitute.org www.globusonline.org    

High energy physics

Molecular biology

Cosmology

Genetics

Metagenomics

Linguistics

Economics

Climate change

Visual arts

computationinstitute.org www.globusonline.org    

What would a “dropbox for science”

look like?

computationinstitute.org www.globusonline.org    

Registry  Staging  Store  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Registry  

Quota exceeded

!

Expired credentials

!

Network failed. Retry.

!

Permission denied

!

It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

BIG  DATA  …for

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

• Collect  • Move  • Sync  • Share     Capabili8es  delivered  using    

So=ware-­‐as-­‐Service  (SaaS)  model  

computationinstitute.org www.globusonline.org    

computationinstitute.org www.globusonline.org    

Data Source

Data Destination

User  iniAates  transfer  request  

1

Globus  Online  moves/syncs  files  

2

Globus  Online  noAfies  user  

3

computationinstitute.org www.globusonline.org    

Data Source

User  A  selects  file(s)  to  share;  selects  user/group,  sets  share  permissions    

1

Globus  Online  tracks  shared  files;  no  need  to  move  files  to  cloud  storage!  

2

User  B  logs  in  to  Globus  Online  and  accesses  shared  file  

3

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

8,000  registered  users;  >100  daily  ~16  PB  moved;  ~1B  files  

10x  (or  beOer)  performance  vs.  scp  99.9%  availability  

En8rely  hosted  on  Amazon  

computationinstitute.org www.globusonline.org    

Globus  Online  already  does  a  lot  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Dataset Services

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

computationinstitute.org www.globusonline.org    

Expanding Globus Online services

•  Ingest and publication –  Imagine a DropBox that not only replicates, but

also extracts metadata, catalogs, converts •  Cataloging

– Virtual views of data based on user-defined and/or automatically extracted metadata

•  Computation – Associate computational procedures,

orchestrate application, catalog results, record provenance

computationinstitute.org www.globusonline.org    

Builds on catalog as a service Approach

•  Hosted user-defined catalogs

•  Based on tag model <subject, name, value>

•  Optional schema constraints

•  Integrated with other Globus services

Three REST APIs /query/ •  Retrieve subjects /tags/ •  Create, delete, retrieve

tags /tagdef/ •  Create, delete, retrieve

tag definitions Builds  on  USC  Tagfiler  project  (C.  Kesselman  et  al.)  

17  

mydata42  

owner:  Francesco  type:  3dtomo  format:  HDF5  beamline:  2BM  

Tomography!

Define  dataset  Infer  type  Extract  metadata  

Populate  catalog(s)  

Locate  datasets  Access  files  

analyze  

Catalog  derived  products  

transfer/schedule  

Orchestra8on  Organiza8on  

Record    provenance    

Annotate,  share  browse,  search  

computationinstitute.org www.globusonline.org    

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

computationinstitute.org www.globusonline.org    

Globus Online Provider Plans

Support ongoing operations

Offer value-added capabilities

Engage more closely with users

computationinstitute.org www.globusonline.org    Starting at $20k per year

•  Provider endpoints with sharing •  Multiple GridFTP servers per endpoint •  Branded web sites •  Alternate identity provider •  Usage reporting •  MSS optimizations •  Operations monitoring and management •  Input into and access to product roadmap

Provider Plans offer…

computationinstitute.org www.globusonline.org    

Thanks to great colleagues and collaborators

•  Steve Tuecke, Rachana Ananthakrishnan, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, and many others at Argonne & Uchicago

•  Carl Kesselman, Karl Czajkowski, Rob Schuler, and others at USC/ISI

•  Birali Runesha and others at UChicago Research Computing Center

computationinstitute.org www.globusonline.org    

Thank  you  to  our  sponsors!  

top related