Top Banner
DOE Perspective on Cyberinfrastructure - LBNL Gary Jung Manager, High Performance Computing Services Lawrence Berkeley National Laboratory Educause CCI Working Group Meeting November 5, 2009
13

Midrange Computing

Feb 06, 2016

Download

Documents

Hazarry Haa

DOE Perspective on Cyberinfrastructure - LBNL Gary Jung Manager, High Performance Computing Services Lawrence Berkeley National Laboratory Educause CCI Working Group Meeting November 5, 2009. Midrange Computing - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Midrange Computing

DOE Perspective on Cyberinfrastructure - LBNL

Gary JungManager, High Performance Computing ServicesLawrence Berkeley National Laboratory

Educause CCI Working Group MeetingNovember 5, 2009

Page 2: Midrange Computing

2

November 5, 2009

Midrange Computing• DOE ASCR hosted a workshop in Oct 2008 to assess the role of mid-

range computing in the Office of Science and revealed that this computation continues to play an increasingly important role in enabling the Office of Science.

• Although it is not part of ASCR's mission, midrange computing, and the associated data management play a vital and growing role in advancing science in disciplines where capacity is as important as capability.

• Demand for midrange computing services is…o growing rapidly at many sites (>30% growth annually at LBNL)o the direct expression of a broad scientific need

• Midrange computing is a necessary adjunct to leadership-class facilities

Page 3: Midrange Computing

3

November 5, 2009

Berkeley Lab Computing• Gap between desktop and National Centers• Midrange Computing Working Group 2001• Cluster support program started in 2002

o Services for PI-owned clusters include: Pre purchase consulting; development of specs and RFP, facilities planning, installation and configuration, ongoing cluster support, user services consulting, cybersecurity, computer room colocation

• Currently 32 clusters in production, over 1400 nodes, 6500 processor cores

• Funding: Institution provides support for infrastructure costs, technical development. Researchers pay for cluster and incremental cost of support.

Page 4: Midrange Computing

4

November 5, 2009

Cluster Support Phase II: Perceus Metacluster• All clusters interconnected into shared cluster infrastructure

o Permits sharing of resources, storage Global home file system

o One ‘super master’ node, used to boot nodes across all clusters multiple system images supported

o One master job scheduler, submitting to all clusterso Simplifies provisioning new systems and ongoing support

• Metacluster model made possible by Perceus softwareo successor to Warewulf (http://www.perceus.org)o can run jobs across clusters, recapturing stranded capacity.

Page 5: Midrange Computing

5

November 5, 2009

Page 6: Midrange Computing

6

November 5, 2009

Laboratory-Wide Cluster - Drivers

“Computation lets us understand everything we do.” – LBNL Acting Lab Director Paul Alivisatos

38% of scientists depend on cluster computing for research.69% of scientists are interested in cycles on a Lab-owned cluster.

o early-career scientists twice as likely to be ‘very interested’ than later-career peers

Why do scientists at LBNL need midrange computing resources?o ‘on ramp’ activities in preparation for running at supercomputing centers

(development, debugging, benchmarking, optimization)o scientific inquiry not connected with ‘on ramp’ activities

Page 7: Midrange Computing

7

November 5, 2009

Laboratory-Wide Cluster “Lawrencium”• Overhead funded program

o Capital equipment dollars shifted from business computingo Overhead funded staffing - 2 FTE

• Production in Fall 2008• General purpose Linux cluster suitable for a wide range of applications

o 198-nodes, 1584 cores, DDR Infiniband interconnecto 40TB NFS home directory storage; 100TB Lustre parallel scratcho Commercial job scheduler and banking systemo #500 on the Nov 2008 Top500

• Open to all LBNL PIs and collaborators on their project• Users are required to complete a survey when applying for accounts and

later provide feedback on science results• No user allocations at this time. This has been successful to date.

Page 8: Midrange Computing

8

November 5, 2009

Networking - LBLNet• Peer at 10GBE with ESNET• 10GbE at core. Moving to 10GbE to the buildings• Goal is sustained high speed data flows with cybersecurity• Network based IDS approach - traffic is innocent until proven

guiltyo Reactive firewall o Does not impede data flow. no stateful firewall.o Bro cluster allows us to scale our IDS to 10GBE

Page 9: Midrange Computing

9

November 5, 2009

Communications and Governance• General announcements at IT council• Steering committees used for scientific computing

o Small group of stakeholders, technical experts, decision makerso Helps to validate and communicate decisionso Accountability

Page 10: Midrange Computing

10

November 5, 2009

Challenges• Funding (past)

o Difficult for IT to shift funding from other areas of computing to support for scienceo Recharge can constrain adoption. Full cost recovery definitely will.

• New Technology (ongoing)• Facilities (current)

o Computer room is approaching capacity despite upgrades Environmental Monitoring Plenum in ceiling converted to hot air return Tricks to boost underfloor pressure Water cooled doors

o Underway DCIE measurement in process Tower and heat exchanger replacement Data Center container investigation

Page 11: Midrange Computing

11

November 5, 2009

Next Steps• Opportunities presented by cloud computing

o Amazon investigation earlier this year. Others ongoing Latency sensitive applications ran poorly as expected Performance dependent of specific use case Data migration. Economics of storing vs moving Certain LBNL factors favor costs for build instead of buy

• Large storage and computation for data analysis• GPU investigation

Page 12: Midrange Computing

12

November 5, 2009

Points of Collaboration• UC Berkeley HPCC

o Recent high profile joint projects between UCB and LBNL encourages close collaboration

o 25-30% of scientists have dual appointmento UC Berkeley proximity to LBNL facilitates the use of cluster services

• University of California Shared Research Computing Services pilot (SRCS)

o LBNL and SDSC joint pilot for the ten UC campuseso Two 272-node clusters located at UC Berkeley and SDSCo Shared computing is more cost-effectiveo Dedicated CENIC L3 connecting network for integrationo Pilot consists of 24 research projects

Page 13: Midrange Computing

13

November 5, 2009