Top Banner
The Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory Marco Lanzagorta ITT Corporation
51

The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Mar 18, 2018

Download

Documents

nguyen_ngoc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

The Naval ResearchLaboratory Cray XD1

Wendell AndersonJeanie Osburn

Robert RosenbergNaval Research Laboratory

Marco LanzagortaITT Corporation

Page 2: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 3: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 4: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

US Naval ResearchLaboratory

NRL is the USNavy’s corporateresearch laboratoryunder the Office ofNaval Research.

Page 5: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

CCS

NRL’s Center for Computational Science:– Distributed Center under the HPCMO– Provides leading edge HPC resources to the

Navy– Conducts evaluation, benchmarking,

research, and development in HPC.

Page 6: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

NRL’s XD1

• 216 nodes with 864 cores and a cumulativespeed of 3.5 TF.– Each node consists of two Opteron 275 2.2 GHz dual

core processors with 8 GB of shared memory, and 73GB 10K rpm 3.5 in. SATA data.

• 144 Xilinx Virtex-II Pro and 6 Virtex-4 FPGAs.

Page 7: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Software

• Cray modified version of SUSE Linux• PGI and GNU Fortran and C/C++

compilers.• MPI support through mpich 2.6• AMD Core Math Library and Cray

Scientific Library.• Xilinx software and tools, Mitrion-C,

Handel-C, and DSPLogic.

Page 8: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Node Usage

The XD1 nodes are used as:4 to support the 30 TB Lustre system1 for monitoring1 for login204 compute nodes scheduled with PBS6 Virtex-4 compute nodes scheduled with PBS

Page 9: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 10: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Scientific Applications

• Popular resource for scientific computing• Provided over 3.3 million core hours.

12

32

32

64-96

128-256

128-256

Avg. # Cores/Run

80,000STARS3D

90,000CHARMM

120,000ADF

600,000NRLMOL

800,000NOZZLE

1,350,000ARMS

Core HrsCode

Page 11: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

ARMS

Simulation of solarstorms by DrC. R. DeVore andDr S.K. Antiochos.

Page 12: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

NOZZLE

Simulation ofCoanda wall jetexperiments byDr A Gross.

Page 13: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

NRLMOL

Dr T Baruah andDr M Pederson’sstudy of themolecularvibrational effectson the simulationof a light-harvestingmolecule.

Page 14: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

ADF

Quantum-chemicalanalysis of theinteraction betweenchemical warfareagents and materialsby Dr S Badescu andDr V Bermudez.

NitrobenzeneAg56 + C6H5NO2

Page 15: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

CHARMM

Study of theinteraction betweenurea and P5GA RNAby Dr A MacKerell,D Priyakumar, andDr Jeff Deschamps

Page 16: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

STARS3D

Dr S Dey usesSTARS3D tostudy widebandacoustic radiationand scatteringfrom submergedelastic structures.

Page 17: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 18: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

ReconfigurableSupercomputing

• With 150 FPGAs, NRL’s XD1 is the largestreconfigurable Cray supercomputer.

• We have started to explore the applicationof FPGA to accelerate scientific codes.

Page 19: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Porting Codes

• First applications are from users whoalready had VHDL codes running on alocal system with a single FPGA.

• Main challenge has been the porting oftheir codes to the XD1.

Page 20: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Neural Networks

• Ken Rice and Tarek MTaha from ClemsonUniversity study large-scale models of theneocortex.

• Modeled up to 321nodes using 64 of theXD1’s Virtex-2 FPGAs.

Page 21: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Neural Networks

Preliminary benchmarks suggest thefollowing speedups over a single AMDcore:– Using all 864 cores: 720– Using all 144 V2P FPGAs, with no SDRAM

use: 31,246.– Using all 144 V2P FPGAs, with SDRAM use:

128,389.

Page 22: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Design ofOptical Devices

CommanderCharles Cameronhas been usingray tracingsoftware to designoptical devices.

Page 23: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

BLASTN

• NRL is currently working with Mitrionics toport their SGI RASC FPGA BLASTNimplementation to the XD1.

• Main problems:– SGI uses 128-bit data paths from a pair of

QDRAMS. XD1 requires 64-bit data pathsfrom a single QDRAM.

– Cray has not finished the Virtex-IV interface tothe XD1.

Page 24: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Other

Several other scientists are in the initialstages of investigating the potentialapplications of FPGAs to:– Cryptography– Hyper-Spectral Image Processing– Ray Tracing– Line of Sight Calculations– Molecular Dynamics

Page 25: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Principal Challenges

• Identification of the portions of a code thatare good candidates for FPGAacceleration.

• Programming of the FPGAs.• Lack of established FPGA programming

strategies for algorithm development.• Lack of portability across HW platforms

and across FPGA programming tools.

Page 26: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 27: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

High Order FPGAProgramming Tools

• FPGA programming using VHDL orVerilog is a difficult and time consumingtask.

• There are a few software packages thatprovide simpler methods to programFPGAs.

• We are currently testing and evaluatingthree of these software packages: MitrionC, Handel C, and DSPLogic.

Page 28: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Mitrion-C

• Developed by Mitrionics.

• Currently supported on Cray XD1, SGIRASC RC100, and Nallatech BenDATA-DD.

Page 29: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Mitrion-C (+)

Advantages:– C-like syntax and constructions.– Straightforward “translation” from ANSI C to

Mitrion C.– Concurrent language with parallel data

structures and parallel control flow directives.– Easier than VHDL or Verilog.– Good simulation, debugging, and algorithm

development tools.

Page 30: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Mitrion-C (-)

Disadvantages:– Most HPC users are Fortran programmers.– Concurrent language.– Mitrion software is closely tied to a specific

version of the Xilinx compiler.– Software maintenance and bug fixes

present a big challenge.

Page 31: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Handel-C

• Developed by Celoxica.

• Only runs on Windows based PC.

• The Linux version has just been released,but there are problems with the release.

Page 32: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Handel-C (+)

Advantages:– C-like syntax and constructions.– Sequential programming with parallel

constructors.– Straightforward “translation” from ANSI C to

Handel C.– Easier than VHDL or Verilog.

Page 33: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Handel-C (-)

Disadvantages:– Most HPC users are Fortran programmers.– The Linux version has just been released, but

with many problems.– Temporary licenses for PCs available, but

imply additional work to install and support thesoftware.

– No support for Virtex-4.– Poor support for the XD1

Page 34: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

DSPLogic

• Based on Simulink, a sophisticatedgraphical interface to Matlab for modelling,simulation, and analysis of dynamicalsystems.

• Algorithms are implemented by draggingblocks from a library into the workspace,and establishing connections betweenthem.

Page 35: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

DSPLogic (+)

Advantages:– Potential access to low level Xilinx primitives.– Appears ideal for digital circuit design.– Block abstraction and code encapsulation may

be valuable for very large and complexreconfigurable codes.

– Good simulator and debugging tools inheritedfrom Simulink.

Page 36: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

DSPLogic (-)

Disadvantages:– Only runs on a Windows-based PC.– User needs to learn/buy Matlab/Simulink.– Simple algorithms often require dozens of

interconnecting blocks.

Page 37: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 38: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Dual Core Efficiency

• The dual cores in the XD1’s Opteron 275share the same DDR memory controller asthe single chip processor version.

• This sharing of memory bandwidth canlead to a degradation of the performanceof the codes running on the dual coreschips.

Page 39: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

A Measure ofEfficiency

• We consider two scenarios:– A code running using n nodes and all 4 cores

on the node takes T4 time.– A code running using 2n nodes and only one

core of each dual core processor takes T2time.

• We define the dual core efficiency as:

!!"

#$$%

& ''(=

2

241100

T

TTDCE

Page 40: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Dual CoreEfficiency Results

39448279RFCTH25477165274OOCORE97849823HYCOM

1209631197AVUS1012728627498NOZZLE7925242090ARMS901626014283NRLMOL221371771LANCZOS93293275CAUSAL56450313STATIC

Efficiency %Both Cores One CoreApplication

Page 41: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Hybrid Codes

• MPI/OpenMP Hybrid Code– Is it more efficient than pure MPI code?

• Developed 3 versions of Causal Code– Pure MPI, Pure OpenMP, Hybrid

MPI/OpenMP• Performance

– Pure MPI and Pure OpenMP had similarperformance on 4 cores

– Pure MPI code still outperformed Hybrid code

Page 42: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Hybrid CodeEfficiency

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

Sp

eed

Up

4 8 16 32

Number of MPI Tasks

Speed Up per MPI process

XD1

Page 43: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Lustre Systems

• The Lustre system is a high speed parallelfile system available to all nodes.– Not a mature technology

• We have recently upgraded the Lustredisk system, adding an additionalcontroller and devoting 4 nodes to therunning of Lustre (instead of 2).

Page 44: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Lustre I/O Rates

893/1280859/142032862/1250892/146016709/1393794/12248646/1298629/6304324/7821325/3262165/417206/1561

Write (MB/sec)old/new

Read (MB/sec)old/new

NODES

Page 45: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 46: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

A Few Problems

We have observed only a few significantissues with the XD1, even though oursystem is the largest one fielded by Cray.– XD1 and Lustre file system interaction.– MPI error messages

Page 47: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Disk Accesses

• Disk accesses were affected whenprograms were using most of thebandwidth to the Lustre nodes.

• A command to list the files in a directorycould take as much as 5 minutes.

• Also the time to rebuild a RAID disk thatfailed would increase from 3 hrs (standalone mode) to 3 days (with users).

Page 48: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Large Files I/O

• Some users have reported that theirprograms crash when writing large files todisk.

• This problem has proved to be verydifficult to track down and reproduce, as itmay take several days before the failureoccurs.

• Tests performed by Cray appear toindicate a problem with GART on a nodeallocated to the job.

Page 49: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

MPI Error Messages

• MPI error messages are misleading andcompletely useless for debuggingpurposes: mpiexec:Error:read_rai_startup_ports: Failed toread barrier entry token from rank3 process on node#”

• Cray is currently working on mpiexec toprovide more meaningful messages.

Page 50: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Presentation Outline

1. The NRL’s Cray XD1System2. Scientific Supercomputing3. Reconfigurable Supercomputing4. FPGA Programming Tools5. Performance Measurements6. Problems and Issues7. Conclusions

Page 51: The Naval Research Laboratory Cray XD1 - Cray User Group · PDF fileThe Naval Research Laboratory Cray XD1 Wendell Anderson Jeanie Osburn Robert Rosenberg Naval Research Laboratory

Conclusions

• The XD1 has proved to be popular atNRL.– Wide variety of scientific codes and

applications.• The development of reconfigurable codes

remains a daunting task.– Usability of FPGA programming suites is by

far the greatest challenge.