The SBGrid Science Portal: An integrated environment for protein structure studies Ian StokesRees Harvard Medical School eScience 2012, Chicago, October 2012
May 11, 2015
The SBGrid Science Portal:An integrated environment forprotein structure studies
Ian Stokes-‐ReesHarvard Medical School
eScience 2012, Chicago, October 2012
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
✦ Interface modalities• Web forms, RESTful interfaces, command line
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
✦ Interface modalities• Web forms, RESTful interfaces, command line
✦ Access model• Browser SSO, X.509, LDAP, .htaccess, GACL
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
✦ Interface modalities• Web forms, RESTful interfaces, command line
✦ Access model• Browser SSO, X.509, LDAP, .htaccess, GACL
✦ Identity management• Streamlined grid account creation
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
✦ Interface modalities• Web forms, RESTful interfaces, command line
✦ Access model• Browser SSO, X.509, LDAP, .htaccess, GACL
✦ Identity management• Streamlined grid account creation
✦ Computational capability• local, cluster, and grid computing
j.mp/esci12-sbgrid [email protected]
What’s interesting about another Science Portal?
✦ Interface modalities• Web forms, RESTful interfaces, command line
✦ Access model• Browser SSO, X.509, LDAP, .htaccess, GACL
✦ Identity management• Streamlined grid account creation
✦ Computational capability• local, cluster, and grid computing
✦ Data management• Web (HTTP), scp, GridFTP, GlobusOnline• Tiered staging of data
j.mp/esci12-sbgrid [email protected]
I’m still skeptical. What about Taverna, GridSphere, Galaxy, or HubZero?
j.mp/esci12-sbgrid [email protected]
I’m still skeptical. What about Taverna, GridSphere, Galaxy, or HubZero?
✦ All great if• the portal or application plugin already exists; and• the application workGlows closely match your
requirements
j.mp/esci12-sbgrid [email protected]
I’m still skeptical. What about Taverna, GridSphere, Galaxy, or HubZero?
✦ All great if• the portal or application plugin already exists; and• the application workGlows closely match your
requirements
✦ Not-‐so-‐great if• you have to implement a new portal on top of one of
those frameworks• you want to adapt the workGlow• your data model changes• you want to add a new application• you want to explore the data in an unanticipated way• command-‐line access is also important to you• you are working with others
j.mp/esci12-sbgrid [email protected]
Links
www.sbgrid.org
portal.sbgrid.org
j.mp/esci12-sbgrid
@ijstokes
j.mp/esci12-sbgrid [email protected]
Outline✦ Community• Who the SBGrid Science Portal is meant to serve
✦ Objectives• What was the vision for the Science Portal
✦ Implementation• Software and service architectures
✦ Security, Collaboration, and IdM• ... or “How I learned to stop worrying and love X.509”
✦ Data• Tiered data distribution model
Rice UniversityE. NikonowiczY. ShamooY.J. Tao
CalTechP. BjorkmanW. ClemonsG. JensenD. Rees
StanfordA. BrungerK. GarciaT. Jardetzky
UCSFJJ MirandaY. Cheng
UC DavisH. Stahlberg
UCSDT. NakagawaH. Viadiu
WesternUM. Swairjo
U. WashingtonT. Gonen
Washington U. School of Med.T. EllenbergerD. Fremont
VanderbiltCenter for Structural Biology
Rosalind FranklinD. Harrison
A. LeschzinerK. MillerA. RaoT. RapoportM. SamsoP. SlizT. SpringerG. VerdineG. WagnerL. WalenskyS.WalkerT.WalzJ. WangS. Wong
N. Beglova S. BlacklowB. ChenJ. ChouJ. ClardyM. EckB. FurieR. GaudetM. GrantS.C. Harrison J. HogleD. JeruzalmiD. KahneT. Kirchhausen
Harvard and Affiliates
NE-CATR. OswaldC. ParrishH. Sondermann
R. CerioneB. CraneS. EalickM. JinA. Ke
Cornell U.
Brandeis U.N. Grigorieff
Tufts U.K. Heldwein
UMass MedicalW. Royer
NIHM. Mayer
U. MarylandE. Toth
K. ReinischJ. SchlessingerF. SigworthF. Zhou
T. BoggonD. BraddockY. HaE. Lolis
Yale U.
C. SandersB. SpillerM. Stone
M. Waterman
W. ChazinB. EichmanM. EgliB. LacyM. Ohi
Columbia U.Q. Fan
Rockefeller U.R. MacKinnon
Thomas JeffersonJ. Williams
Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan
Community
j.mp/esci12-sbgrid [email protected]
Structural Biology:Study of Protein Structure and Function
1mm
10nm
400m
j.mp/esci12-sbgrid [email protected]
Structural Biology:Study of Protein Structure and Function
1mm
10nm
400m
• Shared scientiGic data collection facility• Data intensive (10-‐100 GB/day)
j.mp/esci12-sbgrid [email protected]
Consortium By The Numbers✦ ~200 member labs• representing about 1500 users
✦ ~200 software packages• multi-‐platform (Linux, OS X)• multi-‐version
✦ 4 FTE staff✦ Automated software distribution• 80 GB for full package• rsync+ssh for updates
✦ Everything “Just Works”• So labs are happy to renew membership and refer friends
j.mp/esci12-sbgrid [email protected]
Boston Life Sciences Hub
• Biomedical researchers• Government agencies• Life sciences• Universities• Hospitals
Tufts
University
School of
Medicin
e
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software✦ Expanded into
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software✦ Expanded into• training events and workshops
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software✦ Expanded into• training events and workshops• best practice guides
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software✦ Expanded into• training events and workshops• best practice guides• shared computational infrastructure
(clusters! OSG! GlobusOnline!)
j.mp/esci12-sbgrid [email protected]
Hug a Life Scientist!✦ Let them know you care ...• ... because the software we give them doesn’t• ... and neither do the systems we subject them to• ... but to be fair, a lot of the pain is self-‐inZlicted
✦ SBGrid came into existence to Zill the tech void/pain experienced by structural biologists
✦ Started with providing reliable compiled software✦ Expanded into• training events and workshops• best practice guides• shared computational infrastructure
(clusters! OSG! GlobusOnline!)• web-‐based collaborative computational and data services
j.mp/esci12-sbgrid [email protected]
Objectives
A.Extensible infrastructure to facilitate development and deployment of novel
computational workGlows
B.Web-‐accessible environment for collaborative,
compute and data intensive science
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases• Science Portals are a big win over cumbersome and complex
Fortran code
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases• Science Portals are a big win over cumbersome and complex
Fortran code
✦ Corollary to Pareto Principle
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases• Science Portals are a big win over cumbersome and complex
Fortran code
✦ Corollary to Pareto Principle• 20% of the time users want or need customized application
work<low and/or result analysis
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases• Science Portals are a big win over cumbersome and complex
Fortran code
✦ Corollary to Pareto Principle• 20% of the time users want or need customized application
work<low and/or result analysis• 80% of the effort to make possible
j.mp/esci12-sbgrid [email protected]
Objectives (explained)
✦ Pareto Principle• 80% of the time users are happy with basic web form interface
to standard application workGlow and canned result analysis• 20% of the effort to address these routine cases• Science Portals are a big win over cumbersome and complex
Fortran code
✦ Corollary to Pareto Principle• 20% of the time users want or need customized application
work<low and/or result analysis• 80% of the effort to make possible• But rare that anyone knows in advance whether 80 or 20 side
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
j.mp/esci12-sbgrid [email protected]
My Experience and PerspectiveAudience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
✦ You’re stuffed
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
✦ You’re stuffed• if workGlow and data are tightly coupled to portal framework
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
✦ You’re stuffed• if workGlow and data are tightly coupled to portal framework
✦ Collaboration is critical:
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
✦ You’re stuffed• if workGlow and data are tightly coupled to portal framework
✦ Collaboration is critical:• you need to be able to share your work (securely)
Audience Participation
j.mp/esci12-sbgrid [email protected]
My Experience and Perspective
✦ The really interesting stuff happens• in the unpredictable 20%
✦ Innovative analytical strategies require• an ability to rapidly adjust work<low and data analysis
✦ You’re stuffed• if workGlow and data are tightly coupled to portal framework
✦ Collaboration is critical:• you need to be able to share your work (securely)• the web is the obvious (only!) way anyone wants to do this
Audience Participation
Implementation and Architecture
j.mp/esci12-sbgrid [email protected]
Front End Interface
✦ Django (Python) web framework
✦ Apache web server✦ Per-‐user protected jobs and data
✦ WebDAV to data✦ ssh access possible✦ Richer access control in development
Results Visualization and Analysis
j.mp/esci12-sbgrid [email protected]
NoSQL hierarchical document store✦ The SBGrid Portal’s leading workGlow:• 100,000 jobs• 300,000 output Giles• 20-‐100k CPU-‐hours
✦ Need a good way to store data• Glexible data format• Glexible analysis output• Gine grained, user-‐driven access control• parallel access• remote access
✦ high capacity non-‐relational hierarchical storage• ????
j.mp/esci12-sbgrid [email protected]
Operating Systemsare Pretty Good
j.mp/esci12-sbgrid [email protected]
Operating Systemsare Pretty Good
✦ File systems work well• organize data carefully (hierarchically)• include meta-‐data (mod_cern_meta, Gile system)• serve intelligently via multiple protocols (http, gridftp)• leverage POSIX ownerships (user, group, other, r/w)• leverage user, group, and volume quotas• storage management and backups are easy easier
j.mp/esci12-sbgrid [email protected]
Operating Systemsare Pretty Good
✦ File systems work well• organize data carefully (hierarchically)• include meta-‐data (mod_cern_meta, Gile system)• serve intelligently via multiple protocols (http, gridftp)• leverage POSIX ownerships (user, group, other, r/w)• leverage user, group, and volume quotas• storage management and backups are easy easier
✦ Process management works well• execute as the actual user, where possible• setuid, su, ssh, suexec, and gsexec can all help with this• process accounting is your friend! (pacct)• leverage ulimit for process resource limits
j.mp/esci12-sbgrid [email protected]
Data Access
Same data servedby web and availablefrom command line
j.mp/esci12-sbgrid [email protected]
Open Science Gridhttp://opensciencegrid.org
✦ US National Cyberinfrastructure
✦ Primarily used for high energy physics computing
✦ 80 sites✦ O(1e5) job slots✦ O(1e6) core-‐hours per day✦ PB scale aggregate storage
5,073,293 hours~570 years
DOEGrids CA@Lawrence Berkley Labs
UC San Diego
Apache
GridSite
Django
Sage Math
R-Studio
SBGrid Science Portal @ Harvard Medical School
MyProxy@NCSA, UIUC
Gratia Acct'ing@FermiLab
FreeIPA
LDAP
VOMS
GUMS
GACL
ID mgmt
glideinWMS factory Open Science Grid
fileserver
SQLDB
scp
GridFTP
data
SRM
WebDAV
cluster
Condor
Cycle Server
VDT
Globus
computation
data
computations
interfaces
User
shell CLI
GUMSGUMSGridFTP +
Hadoop
GlobusOnline@Argonne
glideinWMS
Monitoring@Indiana
Ganglia
Nagios
monitoring
RSV
pacct
Service Architecture
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status✦ 262 users (lifetime), 72 active in past quarter
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status✦ 262 users (lifetime), 72 active in past quarter✦ 2.4 million hours on OSG last 12 months
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status✦ 262 users (lifetime), 72 active in past quarter✦ 2.4 million hours on OSG last 12 months✦ Seamless data sharing from web to ssh?• requires NFSv4 to allow >12 POSIX groups/user• suexec or gsexec possibility
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status✦ 262 users (lifetime), 72 active in past quarter✦ 2.4 million hours on OSG last 12 months✦ Seamless data sharing from web to ssh?• requires NFSv4 to allow >12 POSIX groups/user• suexec or gsexec possibility
✦ Account integration• PAM (ssh/command line) + web through FreeIPA LDAP• prototype of X.509 + VOMS + MyProxy (next section!)
j.mp/esci12-sbgrid [email protected]
SBGrid Portal: Current Status✦ 262 users (lifetime), 72 active in past quarter✦ 2.4 million hours on OSG last 12 months✦ Seamless data sharing from web to ssh?• requires NFSv4 to allow >12 POSIX groups/user• suexec or gsexec possibility
✦ Account integration• PAM (ssh/command line) + web through FreeIPA LDAP• prototype of X.509 + VOMS + MyProxy (next section!)
✦ Collaboration• shared secret (password)• manual .htaccess or .gacl
Identity Management*
* or “How I learned to stop worrying and love X.509”
j.mp/esci12-sbgrid [email protected]
Big Picture
j.mp/esci12-sbgrid [email protected]
Big Picture✦ Federated environment requires• federated identity management• trusted identity providers (“roots of trust”)
j.mp/esci12-sbgrid [email protected]
Big Picture✦ Federated environment requires• federated identity management• trusted identity providers (“roots of trust”)
✦ Collaboration requires• user-‐driven capacity to form cross-‐organization user groups
(aka “Virtual Organizations”)• roles (or at least privilege levels) within VO
j.mp/esci12-sbgrid [email protected]
Big Picture✦ Federated environment requires• federated identity management• trusted identity providers (“roots of trust”)
✦ Collaboration requires• user-‐driven capacity to form cross-‐organization user groups
(aka “Virtual Organizations”)• roles (or at least privilege levels) within VO
✦ State of Play• InCommon will get us part way there (waiting on adoption!)• OpenID nice for users, but no trust or delegated perms• X.509 process and details still tough for end user• SSH keys lack standard root of trust and roles
j.mp/esci12-sbgrid [email protected]
✦ Analogy to a passport:• Application form• Sponsor’s attestation• Consular services
• veriGication of application, sponsor, and accompanying identiGication and eligibility documents
• Passport issuing ofGice
✦ Portable, digital passport• Gixed and secure user identiGiers
• name, email, home institution• signed by widely trusted issuer• time limited• ISO standard
X.509 Digital CertiZicates
j.mp/esci12-sbgrid [email protected]
X.509 Challenges✦ Lots of “humans in the loop” to get usable cert• Registration Agent, Sponsor, VO Manager, User
✦ Awkward working with X.509 certs• multiple formats• proxy certs and VOMS ACs• proxy servers (MyProxy)• expiry (of proxy, of base cert, of VO membership)• browser integration and import process• CA cert chain• digital token needs to be available on all devices
• particularly challenging for phones and tablets
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)✦ User never sees X.509 anything• unless they want to
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)✦ User never sees X.509 anything• unless they want to
✦ X.509 request + VO membership + account creation completed in one step by one person• single step for user• single step for one administrator
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)✦ User never sees X.509 anything• unless they want to
✦ X.509 request + VO membership + account creation completed in one step by one person• single step for user• single step for one administrator
✦ Goodbye passphrases (and forgotten passphrases)• hold private key in LDAP and use LDAP authentication to access
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)✦ User never sees X.509 anything• unless they want to
✦ X.509 request + VO membership + account creation completed in one step by one person• single step for user• single step for one administrator
✦ Goodbye passphrases (and forgotten passphrases)• hold private key in LDAP and use LDAP authentication to access
✦ Automate everything• login (web or command line) triggers X.509 proxy request with
(default) VOMS AC, and loading to MyProxy server
j.mp/esci12-sbgrid [email protected]
X.509 Nirvana (ours at least)✦ User never sees X.509 anything• unless they want to
✦ X.509 request + VO membership + account creation completed in one step by one person• single step for user• single step for one administrator
✦ Goodbye passphrases (and forgotten passphrases)• hold private key in LDAP and use LDAP authentication to access
✦ Automate everything• login (web or command line) triggers X.509 proxy request with
(default) VOMS AC, and loading to MyProxy server
✦ VO Management System run by users• Users need to be able to self-‐manage their (sub-‐) VOs
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
T0 = late Saturdaynight lab session
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
T+60h = early-‐Tuesdayresponse
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
T+60h = early-‐Tuesdayresponse
T+66h = late-‐Tuesdayresponse
j.mp/esci12-sbgrid [email protected]
Addressing CertiZicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
T+60h = early-‐Tuesdayresponse
T+66h = late-‐Tuesdayresponse
T+70h = late-‐TuesdaySTAGE 1
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
T+82h = mid-‐Wednesdayask “What next?”
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
T+82h = mid-‐Wednesdayask “What next?”
T+95h = early-‐Thursdayresponse (time zone!)
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
T+82h = mid-‐Wednesdayask “What next?”
T+95h = early-‐Thursdayresponse (time zone!)
T+100h = early-‐Thursdayresponse
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
T+82h = mid-‐Wednesdayask “What next?”
T+95h = early-‐Thursdayresponse (time zone!)
T+100h = early-‐Thursdayresponse
T+105h = mid-‐Thursdayresponse
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
T+82h = mid-‐Wednesdayask “What next?”
T+95h = early-‐Thursdayresponse (time zone!)
T+100h = early-‐Thursdayresponse
T+105h = mid-‐Thursdayresponse
T+105h = 4.5 days waiting
!"#$%&'()' *+",-"#' .-/#'
0/#123'/&14151&1$3'
6",7#8'/&14151&1$3'
4/,/#%$/'6/#$'9/3'+%1#'
-/$'#/$#1/0%&'-/#1%&',:85/#'
;<=*'
#/>:/-$'+"#$%&'%66":,$'
0/#176%?",'/8%1&'-/,$'
/8%1&'0/#17/@'
AB)!'
;<')@81,'
,"?23'%4/,$-'
6#/%$/'&"6%&'%66$'
-14,'6/#$'
!"#$%&'(%)*+%',,#""%-#.#$'/#.%$#"*!$,#"%
-/$';<'#14C$-'
,"?23'%0%1&%51&1$3'
+"#$%&'&"41,'
#/>:/-$'-14,/@'6/#$'
#/$:#,'$#%691,4',:85/#'
%++#"0/'6/#$'
%66":,$'#/%@3',"?76%?",'
#/>:/-$'-14,/@'6/#?76%$/'
#/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$'1,$"'!F(*GDH'7&/'H'E'6#/%$/'&"6%&'+#"I3'6/#$'
=3!#"I3'
#/41-$/#'+#"I3'6/#$'J1$C'=3!#"I3'
!"#$%&'(%)*+%',,#""%+#0%1*$/'2%
U1
U2*
S1*A1a
A1btime
!"#$%&'()' *+",-"#' .-/#'
0/#123'/&14151&1$3'
6",7#8'/&14151&1$3'
4/,/#%$/'6/#$'9/3'+%1#'
-/$'#/$#1/0%&'-/#1%&',:85/#'
;<=*'
#/>:/-$'+"#$%&'%66":,$'
0/#176%?",'/8%1&'-/,$'
/8%1&'0/#17/@'
AB)!'
;<')@81,'
,"?23'%4/,$-'
6#/%$/'&"6%&'%66$'
-14,'6/#$'
!"#$%&'(%)*+%',,#""%-#.#$'/#.%$#"*!$,#"%
-/$';<'#14C$-'
,"?23'%0%1&%51&1$3'
+"#$%&'&"41,'
#/>:/-$'-14,/@'6/#$'
#/$:#,'$#%691,4',:85/#'
%++#"0/'6/#$'
%66":,$'#/%@3',"?76%?",'
#/>:/-$'-14,/@'6/#?76%$/'
#/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$'1,$"'!F(*GDH'7&/'H'E'6#/%$/'&"6%&'+#"I3'6/#$'
=3!#"I3'
#/41-$/#'+#"I3'6/#$'J1$C'=3!#"I3'
!"#$%&'(%)*+%',,#""%+#0%1*$/'2%
U1
U2*
S1*A1a
A1btime
T0 = late Saturdaynight lab session
!"#$%&'()' *+",-"#' .-/#'
0/#123'/&14151&1$3'
6",7#8'/&14151&1$3'
4/,/#%$/'6/#$'9/3'+%1#'
-/$'#/$#1/0%&'-/#1%&',:85/#'
;<=*'
#/>:/-$'+"#$%&'%66":,$'
0/#176%?",'/8%1&'-/,$'
/8%1&'0/#17/@'
AB)!'
;<')@81,'
,"?23'%4/,$-'
6#/%$/'&"6%&'%66$'
-14,'6/#$'
!"#$%&'(%)*+%',,#""%-#.#$'/#.%$#"*!$,#"%
-/$';<'#14C$-'
,"?23'%0%1&%51&1$3'
+"#$%&'&"41,'
#/>:/-$'-14,/@'6/#$'
#/$:#,'$#%691,4',:85/#'
%++#"0/'6/#$'
%66":,$'#/%@3',"?76%?",'
#/>:/-$'-14,/@'6/#?76%$/'
#/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$'1,$"'!F(*GDH'7&/'H'E'6#/%$/'&"6%&'+#"I3'6/#$'
=3!#"I3'
#/41-$/#'+#"I3'6/#$'J1$C'=3!#"I3'
!"#$%&'(%)*+%',,#""%+#0%1*$/'2%
U1
U2*
S1*A1a
A1btime
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
!"#$%&'()' *+",-"#' .-/#'
0/#123'/&14151&1$3'
6",7#8'/&14151&1$3'
4/,/#%$/'6/#$'9/3'+%1#'
-/$'#/$#1/0%&'-/#1%&',:85/#'
;<=*'
#/>:/-$'+"#$%&'%66":,$'
0/#176%?",'/8%1&'-/,$'
/8%1&'0/#17/@'
AB)!'
;<')@81,'
,"?23'%4/,$-'
6#/%$/'&"6%&'%66$'
-14,'6/#$'
!"#$%&'(%)*+%',,#""%-#.#$'/#.%$#"*!$,#"%
-/$';<'#14C$-'
,"?23'%0%1&%51&1$3'
+"#$%&'&"41,'
#/>:/-$'-14,/@'6/#$'
#/$:#,'$#%691,4',:85/#'
%++#"0/'6/#$'
%66":,$'#/%@3',"?76%?",'
#/>:/-$'-14,/@'6/#?76%$/'
#/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$'1,$"'!F(*GDH'7&/'H'E'6#/%$/'&"6%&'+#"I3'6/#$'
=3!#"I3'
#/41-$/#'+#"I3'6/#$'J1$C'=3!#"I3'
!"#$%&'(%)*+%',,#""%+#0%1*$/'2%
U1
U2*
S1*A1a
A1btime
T0 = late Saturdaynight lab session
T+40h = mid-‐Mondayresponse
T+40h = 1.7 day wait
j.mp/esci12-sbgrid [email protected]
Data Management
j.mp/esci12-sbgrid [email protected]
Data Tiers -‐ Scoping
• VO-‐wide: all sites, admin managed, very stable
• User archive: single site, user managed, very stable, 10+ GB
• User project: all sites, user managed, 1-‐10 weeks, 1-‐3 GB
• User static: all sites, user managed, indeZinite, 10 MB
• Job set: all sites, infrastructure managed, 1-‐10 days, 0.1-‐1 GB
• Job: direct to worker node, infrastructure managed, 1 day, <10 MB
• Job indirect: to worker node via UCSD, infrastructure managed, 1 day, <10 GB
j.mp/esci12-sbgrid [email protected]
About 2PB with40 front end servers for high bandwidth parallel Gile transfer
Data Movementscp (users)rsync (VO-‐wide)grid-‐ftp (UCSD)curl (WNs)cp (NFS)htcp (secure web)http(s) (web)
j.mp/esci12-sbgrid [email protected]
Globus Online: High Performance Reliable 3rd Party File Transfer
portal
cluster
desktop laptop
lab fileserver
data collectionfacility
GUMSDN to user mapping
VOMSVO membership
CertiGicate Authorityroot of trust
Globus OnlineZile transfer service
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
Ryan applies for an account at the SBGrid Science Portal
automated X.509application
automated Globus Online application/X.509 linking(wish list!)
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
Ryan applies for an account at the SBGrid Science Portal
automated X.509application
automated Globus Online application/X.509 linking(wish list!)
veriZication of lab membership
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
Ryan applies for an account at the SBGrid Science Portal
automated X.509application
automated Globus Online application/X.509 linking(wish list!)
veriZication of lab membership
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
request accessto NRAMMfacility
using credential held by SBGrid
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
request accessto NRAMMfacility
using credential held by SBGrid
check SBGrid for Ryan’s group membership
in Frank Lab, so grant access to Ziles
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
use Globus Online to managetransfer from NRAMM back to lab
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
use Globus Online to managetransfer from NRAMM back to lab
initiate transfer at NRAMM
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
use Globus Online to managetransfer from NRAMM back to lab
initiate transfer at NRAMM
transfer data to lab
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
use Globus Online to managetransfer from NRAMM back to lab
initiate transfer at NRAMM
transfer data to lab
notify user of completion
datacollec(on
6+monthstaging+storage
10+TB+per+group+permanent+archive
/stage/sliz
/stage/murphy
/data/murphy
/data/deacon
Sliz+lab
~andy~sue
~sco?
NEBCAT+beamline+at+APS
SBGrid'Science'PortalGlobus'Online
Harvard "Sco?"+fromthe+Sliz+lab
VOMS
/data/sliz
/embarg/2010/
/embarg/2011/
/public/2009/
10+PBpublic+archive
general+public
WWW
Tier'1 Tier'2 Tier'3
Local accounts withinlab infrastructure
Shared (lab level) accounts at facility
User can directly access lab or facility data from laptop
Public access available to archived data through web interface
Embargo policy to hold deposited data for agreed time
Tiered storage
VO management
Data at Shared ScientiZic Facilities
Data at Shared ScientiZic Facilities✦ SBGrid
• manages all user account creation and credential mgmt• hosts MyProxy, VOMS, GridFTP, and user interfaces
Data at Shared ScientiZic Facilities✦ SBGrid
• manages all user account creation and credential mgmt• hosts MyProxy, VOMS, GridFTP, and user interfaces
✦ Facility• knows about lab groups
• e.g. “Harrison”, “Sliz”
• delegates knowledge of group membership to SBGrid VOMS• facility can poll VOMS for list of current members
• uses X.509 for user identiGication• deploys GridFTP server
Data at Shared ScientiZic Facilities✦ SBGrid
• manages all user account creation and credential mgmt• hosts MyProxy, VOMS, GridFTP, and user interfaces
✦ Facility• knows about lab groups
• e.g. “Harrison”, “Sliz”
• delegates knowledge of group membership to SBGrid VOMS• facility can poll VOMS for list of current members
• uses X.509 for user identiGication• deploys GridFTP server
✦ Lab group• designates group manager that adds/removes individuals• deploys GridFTP server or Globus Connect client
Data at Shared ScientiZic Facilities✦ SBGrid
• manages all user account creation and credential mgmt• hosts MyProxy, VOMS, GridFTP, and user interfaces
✦ Facility• knows about lab groups
• e.g. “Harrison”, “Sliz”
• delegates knowledge of group membership to SBGrid VOMS• facility can poll VOMS for list of current members
• uses X.509 for user identiGication• deploys GridFTP server
✦ Lab group• designates group manager that adds/removes individuals• deploys GridFTP server or Globus Connect client
✦ Individual• username/password to access facility and lab storage• Globus Connect for personal GridFTP server to laptop• Globus Online web interface to “drive” transfers
j.mp/esci12-sbgrid [email protected]
Summary✦ Don’t discount unpredictable 20%• need Glexibility to innovate and explore (data and comp)
✦ “Last mile” challenge• to the desktop• to the laptop
✦ UniGied and simpliGied identity management• centralized set of credentials for each person• tight links to CA/X.509, LDAP, MyProxy and VOMS
✦ Empower collaborations to self-‐manage✦ Shift of focus from “compute” to “data”• for users• for facilities where data is the main challenge
j.mp/esci12-sbgrid [email protected]
Q&A and Acknowledgements✦ Piotr Sliz
• Supervisor and PI at Harvard Medical School• Chair of SBGrid Consortium
✦ SBGrid Science Portal• Daniel O’Donovan, Meghan Porter-‐Mahoney, Mick Timoney
✦ SBGrid System Administrators• Ian Levesque, Peter Doherty, Steve Jahl
✦ Facility Collaborators• Frank Murphy (NE-‐CAT/APS)• Ashley Deacon (JCSG/SLAC)
✦ Globus Online Team• Steve Tueke, Ian Foster, Rachana Ananthakrishnan, Raj Kettimuthu
✦ OSG Collaborators• Ruth Pordes, Director of OSG, for championing SBGrid• Terrence Martin, for UCSD HDFS support• Steve Timm and Keith Chadwick (FNAL) for helping resolve OSG problems