Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011
Feb 24, 2016
Aim High…Fly, Fight, Win
NWP Transition from AIX to LinuxLessons Learned
Dan SedlacekAFWA Chief Engineer
AFWA A5/8
14 MAR 2011
Aim High…Fly, Fight, Win
Overview
Introduction AFWA Architecture Applications run on HPC Original NWP Environment Linux Configuration TCO Comparison Lessons Learned Future Linux Plans Summary
Aim High…Fly, Fight, Win
Introduction
AFWA has a long history of AIX HPC environment Air Force Weather Environment
Worldwide, 24x7x365, systems, weather data and product support Headquarters, Operational Weather Squadrons (OWS), and Combat Weather
Teams (CWTs), Climatological Center (14th WS) 600+ systems across 4 distinct security enclaves 16 million+ lines of code ~1,000 software applications supported
As model resolutions improve and processing requirements soar, AFWA requirements for NWP processing capability have increased dramatically
SEMS (in-house support contractor) performed a study, evaluating IBM, HP, and Cray
Red Hat Linux on HP hardware Transitioning from IBM/AIX to HP/Linux has resulted in a significant
savings in Total Cost of Ownership (TCO)
Aim High…Fly, Fight, Win
AFWA Architecture(Unclassified Only)
Aim High…Fly, Fight, Win
Applications Run on HPC
Run Regional Models WRF WRF Chem CDFS II (future) Dust LIS
Run Global UM Ensembles Model post-processing Misc space products
Aim High…Fly, Fight, Win
Original NWP Environment(Unclassified)
Aim High…Fly, Fight, Win
“Free” Hardware Adventure
In 2008 AFWA evaluated JVN (available from HPCMO Modernization) 1024 compute nodes 36 racks of equipment 589 KW power requirements 161 tons of cooling
The “Free” hardware was not cost-effective SEMS performed a study to evaluate alternatives New hardware was more cost effective
Less space Less power Less cooling More Flops Lower TCO Decision made to pursue Linux HPC solution
Aim High…Fly, Fight, Win
AFWA Unclassified HPC Configuration
Aim High…Fly, Fight, Win
Linux ConfigurationProd 8/DC3
OS: Linux RHEL 5.3 File System: Lustere Disk: 50 TB I/O Bandwidth: 900 Mb/s throughput Chipset (2) ) 2.53 GHz Intel Nehalem E5540 quad-core CPUs per node Compute Blades: 128 Cores/Memory: 1024 cores, 3GB per core Processing capacity: 10 TeraFlops (Production) Test and development system (DC3): 5 TeraFlops
Aim High…Fly, Fight, Win
TCO Comparison
Original 10 TeraFlops of IBM/AIX HPC O&M (non-labor) - $1.4M Nominally $133K per TeraFlop for IBM/AIX HPC Annual projected O&M costs for Linux (now totalling 24
TeraFlops) - $ 1M Conservatively, $30K per TeraFlop for HP/Linux HPC Bottom line: Linux HPC solution represented a significant
savings
Aim High…Fly, Fight, Win
Lessons Learned
Not all “free” hardware is desirable (JVN) Differences in Linux vs. AIX compilers (minor, but require modifications) Significant tuning differences between AIX and Linux File system configurations significantly different (Lustere/IBRIX vs GPFS) Job scheduler differences had to be worked through
(IBM Load Leveler vs. Platform LSF) Full reduction of TCO doesn’t occur until previous OS support is no
longer required So far, Linux has been proven to be a reliable and cost-effective OS for
NWP
Aim High…Fly, Fight, Win
Future Linux Plans
5000+ core Linux cluster is being planned for delivery in August 2011 Represents 51 TeraFlops of additional capability Total HPC capacity by end of year 2011 > 90 TeraFlops Total phase out of IBM/AIX HPC environment
Aim High…Fly, Fight, Win
Summary
Total Cost of Ownership is complex Initial costs Transition costs Facility costs Support costs
Linux does scale well Linux is a viable and cost-effective HPC platform Transitioning from IBM/AIX to HP/Linux has resulted
in a significant TCO savings