Dell OpenManage Tools · Dell OpenManage Tools for High-Performance Computing Cluster Management High-performance computing (HPC) ... ating systems as well as for the latest RAID
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Remote Access (ERA), ERA Option (ERA/O), and ERA/Modular
Chassis (ERA/MC).
Array Manager console. Array Manager is a storage configu-
ration and management tool that allows cluster administrators
to configure and manage local and remote storage attached to
a system.3 The Array Manager console communicates with the
2 For more information, see “Remote Management with the Baseboard Management Controller in Eighth-
Generation Dell PowerEdge Servers” by Haihong Zhuo; Jianwen Yin, Ph.D.; and Anil V. Rao in Dell Power
Solutions, October 2004, www.dell.com/downloads/global/power/ps4q04-20040110-Zhuo.pdf.3 For more information, see the Dell OpenManage Array Manager User’s Guide at support.dell.com/support/eedocs/software/smarrman.
CLUSTERING SPECIAL SECTION: HIGH-PERFORMANCE COMPUTING
server management framework to provide storage management
functions, including physical and logical views of storage; context-
sensitive menus; dialog boxes; wizards; and property pages that
display alerts, events, and so forth.
BMC management utility. The BMU is designed for Intelligent
Platform Management Interface (IPMI)–compliant servers.4 The BMU
provides out-of-band management and BMC configuration features.
Features provided in the BMU include the IPMI shell (ipmish), a CLI
to correspond with the remote BMC, and a Remote Management and
Control Protocol (RMCP)–based proxy that allows BIOS- and OS-level
console redirection through the LAN—which is also known as Serial
Over LAN (SOL). In the deployment phase, the BMU (via ipmish) is
typically used to power up a new server remotely. In the operational
phase, the BMU (via ipmish) can remotely power cycle a hung node,
and it can be used (via SOL) as an out-of-band remote console.
Dell OpenManage Service and Diagnostics Utilities CDThe Dell OpenManage Service and Diagnostics Utilities CD pro-
vides utilities, updates, and diagnostic tools from Dell.
Utilities and updates. This CD provides the latest BIOS, firm-
ware, services, and OS-based online diagnostics for supported sys-
tems. Note: Updates using the CD are supported only on systems
running Microsoft Windows. To use the CD on Novell NetWare
or Red Hat Enterprise Linux platforms, administrators must first
use a Microsoft Windows–based system to extract the required
drivers from the CD; then they can share those drivers with the
systems running other operating systems.
Diagnostics. Hardware and firmware diagnostic utilities can
be used locally and remotely on Dell PowerEdge servers. The diag-
nostic utilities are supported on certain Microsoft Windows NT®
and Windows 2000 versions. Administrators can select the tests to
be run—simultaneously or individually—on various components
of a server. However, these diagnostics utilities are limited to
addressing problems on individual servers and will not resolve
or identify problems that arise at the network level. Examples
of components that can be diagnosed include hard disk drives,
various media drives, Peripheral Component Interconnect (PCI)
buses, SCSI devices, serial and parallel ports, and NICs. Easy-to-
use GUIs enable administrators to access functions such as the
Test Queue Viewer (lists the currently selected tests, queued to be
run sequentially); the Progress Viewer (shows the test’s progress
while it is running); the Diagnostic tree (lists the available tests
based on the component); and the Component Selector (lists the
diagnosable components).
In addition to the tools provided on the Service and Diag-
nostic Utilities CD, the latest BIOS, firmware, drivers, and Dell
OpenManage applications can be obtained from the Dell support
Web site at support.dell.com.
Dell OpenManage Documentation CDThe Dell OpenManage Documentation CD provides manuals for
Dell hardware and software.
Hardware manuals. The Documentation CD contains user’s
guides, installation guides, and troubleshooting guides for Dell
PowerEdge systems. It also provides RAID controller–related docu-
mentation, such as user guides and driver installation guides, as
well as adapter card and modem documentation.
Operating manuals for Dell OpenManage components. This
CD also includes the Software Quick Installation Guide and guidese
for various Dell OpenManage components such as Array Manager,
DRAC, ITA, the BMC, and the Server Update Utility (SUU).
Web-downloadable componentsVarious Dell OpenManage components are also available on
the Web.
Server Update Utility. The SUU is a CD-based application
that can be used to identify and apply the latest updates (BIOS,
firmware, and drivers) to a Dell PowerEdge server. The application
provides a comparison report differentiating component versions.
It also allows administrators to update components using precon-
figured System Update Sets. The SUU uses a database of firmware,
drivers, and BIOS components for Dell PowerEdge servers called
Figure 1. Architecture of a typical HPC cluster
Out-of-band fabric
Network switchNetwork switchNetwork switch
ITA node
Master node
Out-of-bandfabric
Network switch
Out-of-band fabric
Compute nodes
Cluster and administration fabric
Publicnetwork
Dell storage
Metadata servers
4 For more information, see “Efficient BMC Configuration on Dell PowerEdge Servers Using the Dell Deployment Toolkit” by Anusha Ragunathan, Alan Brumley, and Ruoting Huang in Dell Power Solutions, February 2005,
(DHCP) server and an NFS server on the network. The kickstart
file and OS image should be housed on the NFS server, while the
networking information, boot kernel, RAM disk, and kickstart file
are placed on the BOOTP/DHCP server. This approach enables
unmanaged PXE-based OS installation across a newly deployed,
large-scale HPC cluster.
Efficient remote cluster managementThe Dell OpenManage suite comprises various tools, which are
available on four CDs as well as on the Dell support Web site. These
tools are designed to ease the deployment and operational phases
of large-scale HPC clusters. Dell OpenManage tools can be used to
5 For more information about the Dell 2161DS Console Switch, visit support.dell.com/support/edocs/systems/smarcon/en/2161DS/hardware/hardware.pdf.6 For information about using Platform Rocks to automate BMC and BIOS configuration during deployment, see “Configuring the BMC and BIOS on Dell Platforms in HPC Cluster Environments” by Garima Kochhar, Rizwan Ali, and
Arun Rajan in Dell Power Solutions, November 2005, www.dell.com/downloads/global/power/ps4q05-20050222-Kochhar.pdf.7 For more information see “Installing Linux High-Performance Computing Clusters” by Christopher Stanton, Rizwan Ali, Yung-Chin Fang, and Munira A. Hussain in Dell Power Solutions, s Issue 4, 2001, ftp.us.dell.com/
app/4q01-Lin.pdf.
CLUSTERING SPECIAL SECTION: HIGH-PERFORMANCE COMPUTING
configure, monitor, and manage the various components within
an HPC cluster environment, including the master node, compute
nodes, management fabrics, switches, and other devices. With these
tools, IT organizations can streamline the process of scaling out
HPC clusters to support growing data centers.
Yung-Chin Fang is a senior consultant in the Scalable Systems Group at Dell. He specializes in HPC systems, advanced HPC architecture, and cyberinfrastructure management. Yung-Chin has published more than 30 conference papers and articles on these topics. He also participates in HPC cluster–related open source groups as a Dell representative.
Arun Rajan is a systems engineer in the Scalable Systems Group at Dell. He has a B.S. in Electronics and Communications Engineering from the National Institute of Technology, Tiruchirappalli, in India and an M.S. in Computer and Information Science from The Ohio State University.
Monica Kashyap is a senior systems engineer in the Scalable Systems Group at Dell. Her current interests and responsibilities include in-band and out-of-band cluster management, cluster computing packages, and product development. She has a B.S. in Applied Science and Computer Engineering from the University of North Carolina at Chapel Hill.
Saeed Iqbal, Ph.D., is a systems engineer and advisor in the Scalable Systems Group at Dell. His current work involves evaluation of resource managers and job schedulers used for standards-based clusters. He is also involved in performance analysis and system design of clusters. Saeed has a B.S. in Electrical Engineering and an M.S. in Computer Engineering from the University of Engineering and Technology in Lahore, Pakistan. He has a Ph.D. in Computer Engineering from The University of Texas at Austin.
Tong Liu is a systems engineer in the Scalable Systems Group at Dell. His current research interests are HPC cluster management, high-availability HPC clusters, and parallel file systems. Tong serves as a program committee member of several conferences and working groups on cluster computing. Before joining Dell, he was an architect and lead developer of High Avail-ability Open Source Cluster Application Resources (HA-OSCAR). Tong has an M.S. in Computer Science from Louisiana Tech University.