NOAA R&D High Performance Computing Colin Morgan, CISSP High Performance Technologies Inc (HPTI) National Oceanic and Atmospheric Administration Geophysical Fluid Dynamics Laboratory, Princeton, NJ
Jan 26, 2016
NOAA R&D High Performance Computing
Colin Morgan, CISSPHigh Performance Technologies Inc (HPTI)
National Oceanic and Atmospheric AdministrationGeophysical Fluid Dynamics Laboratory, Princeton, NJ
R&D HPCS Background InformationScientific
• Large Scale heterogeneous supercomputing architecture
• Provide cutting edge technology for weather and climate model developers
• Models are developed for weather forecasts, storm warnings and climate change forecasts
• 3 R&D HPCS Locations• Princeton, NJ• Gaithersburg, MD• Boulder, CO
R&D HPCS Background InformationSupercomputing
• Princeton, NJ – GFDL• SGI Altix 4700 Cluster• 8000 Cores• 18PB of Data
• Gaithersbug, MD• IBM Power6 Cluster• ~1200 Power6 Cores• 3PB of Data
• Boulder, CO – ESRL• 2 Linux Clusters• ~4000 Xeon Hapertown/Woodcrest Cores• ~1PB of data
• Remote Computing Allocated Hours• Oak Ridge National Labs – 104 Million Hours• Argonne National Labs – 150 Million Hours• NERSC – 10 Million Hours
R&D HPCS InformationData Requirements
• Current Data Requirements• GFDL Current Data Capacity – 32PB• GFDL Current Data Total – 18PB• GFDL – A growth of 1PB every 2 months• Remote Compute – 6-8TB a day of data ingest
How does that much data get transferred?
• Future Data Requirements• 30-50TB a day from remote computing• 150-200PB in the next 3 years of total data
R&D HPCS InformationCurrent Data Transfer Methods - BBCP
• BBCP – transfer rates are affected when file is being closed out
Source Destination Window Size MTU File Size BBCP Transfer Rate
Transfer Rate (Mbs)
GFDL ORNL 1024 1500 410MB 11.5MBs 92.0Mbs
GFDL ORNL 1024 1500 410MB 14.4MBs 115.2Mbs
ORNL GFDL 1024 1500 410MB 9.0MBs 72.0Mbs
ORNL GFDL 1024 1500 410MB 9.8MBs 78.4Mbs
• 400-500Mbs is the typical transfer rate, limited by Disk IO not the Network
LAN SWITCH
ESNET
PERIMETER FIREWALL
Argonne
VLAN1
VLAN2
NERSC
VLAN3
10G
ntt1 – receive hostntt2 – receive hostntt3 - IC0/IC9 pull host
1G
1G NETAPPntt1, ntt2, ntt3, ntt4 – RHEL5IC0, IC9 – OpenSuse 10.1Switches – Cisco 6500 seriesFirewall – Cisco FWSM
PERIMETER SWITCH
R&D CORE SWITCH/FW
10G
SWITCH FABRIC CONNECTION
10G20G
10G10G
• Require 6-8 TB/Day of inbound data ingest from ORNL• ARL & NERSC do not have the same data ingest requirements
GRIDFTP Servers
Write Interface - Netapp
Receive Interface - External
MGMT VLAN
PRIVATE VLAN
IC0 IC09
100TB Disk Cache
Oak Ridge
SGI Cluster
ntt4 ntt3ntt2ntt1
R R RW
R&D HPCS InformationFuture Data Transfer Methods - GRIDFTP
R&D HPCS InformationFiber Networking
What are we doing now?
What do we plan to do?
ESNET
Internet2NLR
MAX
NyserNet
Bison
SDN
3ROX
FRGP
MagPi
R&D HPCS InformationCurrent Connectivity
Commodity Internet
Internet2
ESNET
Boulder, CO
Gaithersburg, MD
Princeton, NJ
45Mbs
1Gbs
10Gbs
R&D HPCS InformationNetwork Connectivity
R&D HPCS InformationPotential Future Networks
• Tooth Fairy $$• Working on preliminary designs• Design review scheduled in early May• Deployment in Q2 of FY10• Looking to talk with
• ESNET • Internet2• National Lambda Rail• Indiana University Global Noc• Interested GigaPops
• The primary focus is to provide a High Speed Network to NOAA’s Research Facilities
R&D HPCS Information
QUESTIONS?