March 20, 2014
XSEDE: A Digital Ecosystem Enhancing
Productivity for All Science & Engineering
John Towns
PI and Project Director, XSEDE
Director, Collaborative Cyberinfrastructure Programs, NCSA
License terms
• Please cite as: Towns, John. XSEDE: A Digital Ecosystem Enhancing Productivity for All Science & Engineering, March 2014, [http://www.slideshare.net/jtownsil/xsede-overview-march2014]
• ORCID ID: http://orcid.org/0000-0001-7961-2277 • Except where otherwise noted, by inclusion of a source URL or some other
note, the contents of this presentation are © by the Board of Trustees of University of Illinois. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – copy and redistribute the material in any medium or format; and to adapt – remix, transform, and build upon the material for any purpose, even commercially.
• This can be done under the following conditions: attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Step-by-Step-Instructions
• How to create an open cyberinfrastructure ecosystem in 1,200 easy steps
3
. . . .
Motivation for XSEDE:
• Scientific advancement across multiple disciplines requires a variety of resources and services
• XSEDE is about increased productivity of the community and providing expanded capabilities – leads to more science – is sometimes the difference between a feasible project
and an impractical one – lowers barriers to adoption
• XSEDE provides a comprehensive eScience infrastructure composed of expertly managed and evolving advanced heterogeneous digital resources and services integrated into a general-purpose infrastructure
4
Boundary Conditions and Principles for XSEDE
• XSEDE inherited TeraGrid environment • XSEDE inherited TG community and their expectations
• Point of view has changed
– not an HPC/CS/tech play – about productivity and creating the environment necessary to be
productive • focus on the success of researchers!
• Finally figured out that the project must define a solution that is designed to evolve! – technologically and organizationally!
• Identify the greatest needs and start there – Don’t forget what you have learned – both good and bad!
• Oh yea… the researchers don’t care about your existence (per se) – they care about access to resources, services and support
5
XSEDE – accelerating scientific discovery
• XSEDE’s Vision: a world of digitally enabled researchers, engineers, and scholars participating in multidisciplinary collaborations to tackle society’s grand challenges
• XSEDE’s Mission: to substantially enhance the productivity of a growing community of researchers, engineers, and scholars through access to advanced digital services that support open research
6
XSEDE’s Strategic Goals
• Deepen and extend the use of the advanced digital research services ecosystem – deepen use by existing researchers, engineers, and scholars – extend use to new communities – prepare the current and next generation via education, training, and
outreach – raise the general awareness of the value of advanced digital services
• Advance the advanced digital research services ecosystem – create an open and evolving e-infrastructure – enhance the array of technical expertise and support services offered
• Sustain the advanced digital research services ecosystem – assure and maintain a reliable and secure infrastructure – provide excellent user support services – operate an effective and innovative virtual organization
7
What is XSEDE?
• An ecosystem of advanced digital services accelerating scientific discovery – support a growing portfolio of resources and services
• advanced computing, high-end visualization, data analysis, and other resources and services
• interoperability with other infrastructures
• A virtual organization (partnership!) providing – dynamic distributed infrastructure – support services, and technical expertise to enable
researchers engineers and scholars • addressing the most important and challenging problems facing
the nation and world
• A project funded by the National Science Foundation
8
XSEDE Factoids: high order bits
• 5 year, US$121M project – plus US$9M, 5 year Technology Investigation Service
• separate award from NSF
– option for additional 5 years of funding upon major review after PY3
• No funding for major hardware – coordination, support and creating a national/international
cyberinfrastructure – coordinate allocations, support, training and documentation for
>$100M of concurrent project awards from NSF
• ~140 FTE /~250 individuals funded across 20 partner institutions – this requires solid partnering!
9
Total Research Funding Supported by XSEDE
in CY2013
10
US$750 million in research supported by XSEDE
in CY2013
Innovation: proactively looking to expand
scope of capabilities
• Striking a balance between providing stable, reliable services and fostering innovation – both in what we are doing and how we do it
• Campus Bridging use cases are mostly for capabilities we have not traditionally supported
• Novel and Innovative Projects team is seeking out new communities and identifying new capabilities necessary to support them
• Architecture design processes explicitly support innovation by the project – more importantly, facilitate innovation by the community
Our goal is to deliver new capabilities – and thus new science – faster
11
Convenience requirements will always increase
Each generation of users
requires more convenience
than the former: thus we must
always be adding new layers of
software while maintaining and
extending existing reliability
and capability.
Change is the only Constant
– Heraclitis 535BC-475BC
12
No, his mind is not for rent
To any god or government.
Always hopeful, yet discontent,
He knows changes aren't permanent,
But change is.
– Rush - Tom Sawyer
What do you mean by “Advanced Digital
Services?”
• Often use the terms “resources” and “services” – these should be interpreted very broadly – most are likely not operated by XSEDE
• Examples of resources – compute engines: HPC, HTC (high throughput computing), campus,
departmental, research group, project, … – data: simulation output, input files, instrument data, repositories, public
databases, private databases, … – instruments: telescopes, beam lines, sensor nets, shake tables, microscopes, … – infrastructure: local networks, wide-area networks, …
• Examples of services – collaboration: wikis, forums, telepresence, … – data: data transport, data management, sharing, curation, provenance, … – access/used: authentication, authorization, accounting, … – coordination: meta-queuing, … – support: helpdesk, consulting, ECSS, training, … – And many more: education, outreach, community building, …
13
Some Unexpected Challenges:
XSEDE is a socio-technical ecosystem
• Highly distributed organization – challenges in managing a project that involves
staff at 20 partner institutions
• A completely virtual organization – breaking new ground from an organizational
structure and management point of view
• Highly distributed engineering project – developing new methodologies to adapt
traditional practices to the unusual context of XSEDE
14
XSEDE offers access to a variety of
resources
• Leading-edge distributed memory systems
• Very large shared memory systems
• High throughput systems, including Open Science Grid (OSG)
• Visualization engines
• Accelerators like GPUs and Xeon PHIs
Many scientific problems have components that call for use of more than one architecture.
15
XSEDE User Portal: THE User Site
portal.xsede.org
• XSEDE User Portal (XUP) is designed to be the only site a user needs to use XSEDE
• XUP presents information relevant to users – user info is easier to find – XUP also provides dynamic data about XSEDE systems – capabilities to manage usage, files, data
• As a user you can – request an allocation, and manage allocations – sign up for training – request help – manage file and data, and much more!
– Portal provides single sign-on to all XSEDE resources
XSEDE offers more in-depth support
Extended Collaborative Support Service
• Support people who understand the discipline as well as the systems (perhaps more than one support person working with a project).
• 37 FTEs, spread over >70 people at more than half a dozen sites.
• Distributed support
– Easier to find the right expert for the project
– allows us to cover many more disciplines than if every site had to staff the common applications.
– support does not have to move with platform change
17
Current XSEDE Compute Resources
• Stampede @ TACC – 9.5 PFLOPS (PF) Dell Cluster w/ GPUs and Xeon PHIs
• Kraken @ NICS – 1.2 PF Cray XT5
• Keeneland @ GaTech/NICS – 615 TF HP GPU cluster
• Gordon @ SDSC – 341 TF Appro Distributed SMP cluster
• Lonestar (4) @ TACC – 302 TF Dell Cluster
• Trestles @ SDSC – 100 TF Appro Cluster
• Blacklight @ PSC – 37 TF SGI UV (2 x 16TB shared memory SMP)
• Mason – 3.8 TF HP Cluster with large memory nodes (2TB/node)
18
https://www.xsede.org/web/xup/resource-monitor
Current XSEDE Visualization and Data
Resources
• Visualization
– Longhorn @ TACC • 20.7 TF Dell/NVIDIA
cluster
• 18.7 TB disk
• Storage
– Ranch @ TACC • 40 PB tape
– HPSS @ NICS • 12 PB tape
– Data Supercell @ PSC • 4 PB disk
– Data Oasis @ SDSC • 4 PB tape
19
https://www.xsede.org/web/xup/ resource-monitor#advanced_vis_systems https://www.xsede.org/web/xup/
resource-monitor#storage_systems
Approach to Other Infrastructures:
Active Interactions • OSG is a significant CI in the US – Level 2 Service Provider in XSEDE
– the nation’s premier high-throughput computing infrastructure • complement traditional HPC resources inherited from TeraGrid
– ties to CI (eScience infrastructure) providers internationally
• PRACE is a significant HPC CI in Europe – PRACE represents both large scale HPC and distributed resources
• subsumed DEISA in 2011
– joint Summer School series – working on joint call for collaborations support later this calendar year
• EGI is a significant HTC CI in Europe – initiating organizational benchmarking effort – identifying collaborating research teams spanning XSEDE-EGI
• HPC Wales – Champions programs, Science Gateways – training content
20
Objectives for Coming Year+ Accelerating the realization of the XSEDE vision
• Deliver new or improved software, services and capabilities on a regular basis – XSEDE Wide Area Filesystem; Global Federated Filesystem; enhanced
single sign-on; science gateway APIs; Canonical Use Case components
• Campus Bridging will promote "XSEDE Compatible" cluster build tools and use of Globus Online and GFFS for data movement and access
• Incorporate the third cadre of under-represented students into the XSEDE Scholars program
• Expand Champions Program to include Regional, Student, and Domain Champions
• Redesign and implement a new allocations request system • Complete baseline architecture and expanded set of defined Use
Cases • Develop joint activities with industry • Further develop relationships with other resource, service and
infrastructure providers
21
Call for participation to be announced before SC13!
Questions?