Top Banner
Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research: CC-NIE Integration: Developing Applications with Networking Capabilities via End- to-End Software Defined Networking (DANCES)
15

Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

Dec 15, 2015

Download

Documents

Julia Woodin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

Kathy Benninger, Pittsburgh Supercomputing Center

Workshop on the Development of a Next-Generation Cyberinfrastructure

1-Oct-2014

NSF Collaborative Research: CC-NIE Integration:

Developing Applications with Networking Capabilities via End-to-End Software

Defined Networking (DANCES)

Page 2: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

2© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

What is DANCES?

• The DANCES project, an NSF funded CC-NIE collaborative award, is developing mechanisms for managing network bandwidth by adding end-to-end software-defined networking (SDN) capability and interoperability to selected CI applications and to application end point network infrastructure

Page 3: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

3© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

DANCES Participants and Partner Sites• Pittsburgh Supercomputer Center (PSC)

• National Institute for Computational Sciences (NICS)

• Pennsylvania State University (Penn State)

• National Center for Supercomputing Applications (NCSA)

• Texas Advanced Computing Center (TACC)

• Georgia Institute of Technology (GaTech)

• eXtreme Science and Engineering Discovery Environment (XSEDE)

• Internet2

Page 4: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

4© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

DANCES Partner Sites on AL2S XSEDEnet

Page 5: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

5© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

DANCES Application Integration Targets

• Add network bandwidth scheduling capability using SDN to supercomputing infrastructure applications

• Resource management and scheduling– Torque/MOAB scheduling software– Enable bandwidth reservation for file transfer

• Wide area distributed file systems– XSEDE-wide file system (XWFS)– SLASH2 wide area distributed file system developed by PSC

Page 6: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

6© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

File System Application Integration Research

• XWFS– Based on IBM’s GPFS, this WAN file system is deployed

across several XSEDE Service Providers. Research activity is XWFS data flow integration with SDN/OpenFlow across XSEDEnet/Internet2

• SLASH2– PSC’s SLASH2 WAN file system is deployed at PSC and

partner sites. Research activity is SLASH2 data flow integration with SDN/OpenFlow and resource scheduling across XSEDEnet/Internet2

Page 7: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

7© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Application Integration Research

• GridFTP– Integration of SDN/OpenFlow capability with the resource

management and scheduling subsystems of XSEDE’s advanced computational cyberinfrastructure to support the GridFTP data transfer application

Page 8: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

8© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

DANCES System Diagram

Page 9: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

9© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

SDN/OpenFlow Infrastructure Integration

• Application interface with SDN/OF environment– Torque Prologue and Epilogue scripts to set up and tear down network

reservation for scheduled file transfer via file system (XWFS, SLASH2) or GridFTP

– Map SLASH2 and XWFS file system interfaces to network bandwidth reservation

– Interface to Internet2’s Open Exchange Software Suite (OESS)• AL2S VLAN provisioning• Establish end-to-end path between file transfer source and destination sites

• SDN/OF-capable switches– Existing infrastructure at some sites (e.g., CC-NIE and CC*IIE recipients)– Evaluating hardware for deployment

Page 10: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

10© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Workflow Example: SDN-enabled SLASH2Note: SLASH2 supports file replication and multiple residency

1. User requests file residency at a particular site

2. SLASH2 checks and returns file residency status

3. Check user authorization for bandwidth scheduling

4. SLASH2 will initiate path set up with end site OpenFlow configuration and transaction with Internet2’s FlowSpace Firewall and OESS for wide area authorization and path provisioning

5. During transfer SLASH2 will poll for remote residency completion

6. Upon completion of transfer, remove the provisioned path

Page 11: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

11© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Workflow Example: Torque/MOAB with GridFTP

1. User creates DANCES-GridFTP job and submits it

2. Torque/MOAB schedules the job when resources are available

3. DANCES-GridFTP job initiated

4. Torque uses Prologue script to send Northbound API instruction to SDN controller to create end-to-end path

5. Path set up will include local OpenFlow configuration and transaction with Internet2’s FlowSpace Firewall and OESS for wide area authorization and path provisioning

6. Torque/MOAB Epilogue script to tear down provisioning when finished

Page 12: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

12© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

User Interaction

• The user community primarily consists of domain researchers and scientists, therefore DANCES emphasizes transparent functionality of the bandwidth scheduling mechanism

• Administratrively, user requests bandwidth reservation capability– As a computational resource from the XRAC (typical one year)– To support a limited-time large data set transfer need (< one year)

• Operationally, a user’s bandwidth reservation request may– Succeed: bandwidth scheduled and transfer will proceed– Be deferred by scheduler with permission, until bandwidth is available– Fail: Request declined, user notified, transfer will proceed as best-

effort along with the unscheduled traffic

Page 13: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

13© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Cyberinfrastructure Issues - Policy

• Criteria for allocating bandwidth scheduling capability to users/projects

• Agreement on the dedicated bandwidth that each site commits for scheduled transfers

• Monitoring and accounting of bandwidth usage

Page 14: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

14© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Cyberinfrastructure Issues - Technical

• Authentication and authorization mechanism for users/projects to allow bandwidth reservation request– Site/XSEDE context– Internet2 AL2S context

• Real-time cross-site tracking and management of allocated bandwidth resources

• Extend Torque/MOAB, XWFS, and SLASH2 to support SDN commands

• Vendor support for OpenFlow 1.3 flow metering

Page 15: Kathy Benninger, Pittsburgh Supercomputing Center Workshop on the Development of a Next-Generation Cyberinfrastructure 1-Oct-2014 NSF Collaborative Research:

15© 2010 Pittsburgh Supercomputing Center

© 2014 Pittsburgh Supercomputing Center

Research Questions• How do multiple SDN/OF controllers overlay into the CI?

• Does OpenFlow 1.3 flow metering meet the performance needs?

• Are there significant SDN/OF operational differences between wide area and machine room environments?

• How well do multi-vendor OpenFlow 1.3 implementations interoperate?

• How to optimize network bandwidth utilization by using bandwidth scheduling?

• What is sufficient verification by project team to pave the way for production deployment at XSEDE and campus sites?