EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide Backup Recovery Systems Division Data Domain LLC 2421 Mission College Boulevard, Santa Clara, CA 95054 866-WE-DDUPE; 408-980-4800 759-0021-0001 Revision C May 18, 2012
34
Embed
DD OS 4.9 Offline Diagnostics Suite User Guide · EMC Proprietary and Confidential - For Internal Use Only DD OS 4.9 Offline Diagnostics Suite User Guide This guide provides a troubleshooting
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
EMC Proprietary and Confidential - For Internal Use Only
DD OS 4.9 Offline Diagnostics Suite User Guide
Backup Recovery Systems DivisionData Domain LLC2421 Mission College Boulevard, Santa Clara, CA 95054866-WE-DDUPE; 408-980-4800759-0021-0001 Revision CMay 18, 2012
EMC Proprietary and Confidential - For Internal Use Only
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC, Data Domain, and Global Compression are registered trademarks or trademarks of EMC Corporation in the United States and/or other countries.
All other trademarks used herein are the property of their respective owners.
DD OS 4.9 Offline Diagnostics Suite User Guide 3
EMC Proprietary and Confidential - For Internal Use Only
DD OS 4.9 Offline Diagnostics Suite User Guide
This guide provides a troubleshooting flow for selecting the appropriate diagnostics for your problem and then running the diagnostics to identify the faulty field replaceable unit (FRU) and generate recommended service actions.
This document covers the topics shown in the following table:
Overview and Supported Systems Page 4
Requirements Page 6
Connect a Console to the System Page 8
Select and Run Diagnostics Page 12
Get Log Information After Running Diagnostics Page 29
Diagnostic Test Descriptions Page 33
EMC Proprietary and Confidential - For Internal Use Only
4 DD OS 4.9 Offline Diagnostics Suite User Guide
Overview and Supported SystemsData Domain provides both online and offline diagnostics for its systems:
• Online diagnostics are invoked on the Data Domain operating system (DD OS) command line. Some diagnostics, such as system status and enclosure show all, which report the status of fans, power supplies, and temperature sensors, are also run automatically in the background to monitor the system during runtime. Email alerts are issued if problems are detected.
• Offline diagnostics are run in response to customer problem reports, such as when a system cannot be booted to online operation, a card or disk is absent, or memory, connectivity, or configuration problems are suspected. Offline diagnostics check FRUs such as the system controller disks, motherboard, memory (DIMMs), NVRAM card, and other hardware.
Major differences in the use of online and offline diagnostics are:
• Offline diagnostics are used when the system is unable to come online.
• Offline diagnostics are used if the system is hanging frequently or has serious performance issues. These diagnostics can isolate performance problems to specific components.
• After online diagnostics detect a problem, offline diagnostics may be needed for further fault isolation or confirmation.
• Online diagnostics detect problems only when they access the part of the component that has the problem, whereas offline diagnostics test the full range of the component —an entire disk, for example—and can detect latent faults.
Data Protection
Note: Diagnostics are run in offline mode, and require a reboot to load.
In offline mode, the Data Domain filesystem is not running, and no customer data is flowing through the system. Tests are completely data-safe and non-destructive.
DD OS 4.9 Offline Diagnostics Suite User Guide 5
EMC Proprietary and Confidential - For Internal Use Only
Supported SystemsTable 1 shows which diagnostic tests can be run on each supported Data Domain system (DDxxx) and which FRU is tested.
Table 1: Offline Diagnostics Support for DD OS 4.9 Systems (X=Supported)
FRU Tested
Test Name
DD140DD610DD630
DD880 DD880g DD670
Caution: The offline diagnostics described in this guide support only the DD OS 4.9 systems shown in Table 1. Do not run these diagnostics on any other DD OS version or system, as unexpected behavior may result.
For test coverage information, see “Diagnostic Test Descriptions” on page 33.
System Controller Boot Disk
HDD Quick Test X X X
System Controller Disks (all)
HDD Comprehensive Test X X X
Fibre Channel HBA Card, Cable
FC Diagnostic (VTL systems are not supported)
DD880g Only
Memory (DIMMs) Memory Diagnostics X X X
Motherboard CPU Test X X X
CPU MCE Test X X X
Motherboard PCIe Topology Test X
Ethernet Network Interface Card (NIC)
Network Internal Loopback Test X
Network External Loopback Test X
NVRAM Card NVRAM Card Test X X X
Serial Attached SCSI (SAS) Daughter and HBA Expansion Cards
SAS Diagnostics Test DD880 Only
X
EMC Proprietary and Confidential - For Internal Use Only
6 DD OS 4.9 Offline Diagnostics Suite User Guide
Requirements
System Controller Boot Disk with DD OS 4.9To boot offline diagnostics, you must have a functional system controller boot disk with DD OS 4.9 installed. This release contains the Offline Diagnostics Suite.
System ConsoleYou must have one of the following consoles connected to the system:
• PS/2 keyboard or USB keyboard with a VGA monitor.
(A USB-to-PS2 converter is needed to use a PS/2 keyboard with DD880 and DD880g.)
• KVM console.
• Serial console or laptop with terminal emulation software such as Secure CRT, PuTTY, or HyperTerminal (required for running DD OS commands). A null modem cable with a DB9 female connector is required to connect a serial console or laptop to the system. Cisco “rollover” cables cannot be used.
If you plan to run diagnostics remotely, be aware that a person is still required onsite in the following situations:
• If the system has crashed or been powered down, it must be power cycled or powered on by a person at the site to boot offline diagnostics.
• If diagnostic log files cannot be written to the system disk, you can save them to a formatted USB key inserted by a person at the site.
Console connections and scenarios are described in “Connect a Console to the System” on page 8.
System and BIOS PasswordsIf the system is up and running DD OS, you must log in as sysadmin to:
• Issue the system reboot command to boot offline diagnostics or enter BIOS mode to connect a serial console for the first time.
• Run certain DD OS diagnostic commands referenced in this guide.
If the system is powered down, you do not need sysadmin access to boot offline diagnostics or enter BIOS mode.
The Data Domain default sysadmin and BIOS passwords are provided with the procedures in this guide. However, if these passwords have been changed, you need the new passwords prior to starting the procedures.
DD OS 4.9 Offline Diagnostics Suite User Guide 7
EMC Proprietary and Confidential - For Internal Use Only
USB Key (Optional)After running diagnostics, log files are automatically saved to the system boot disk and to an external USB key, if one is inserted (see “Inserting and Removing a USB Key for Writing Logs” on page 20).
You may want to use a USB key to store log files if:
• Diagnostic logs cannot be written to the system disk. (You are prompted to insert a USB key, or cancel without saving logs.)
• You might not be able to reboot the system to online mode to access the offline diagnostic log file (a concatenation of all logs) on the system boot disk.
Requirements for the USB key (a.k.a. keychain drive, thumb drive, or flash memory stick) are:
• FAT32 (Unix VFAT) format
• 10 MB of free space
For more information on viewing log files on the system or USB key after running offline diagnostics, see “Get Log Information After Running Diagnostics” on page 29.
System Downtime
The typical time required to run all tests in the suite is 45 to 70 minutes, depending on the system type and configuration.
Table 2 shows the maximum possible runtime for each test as displayed in the diagnostics user interface. Maximum runtime is the test execution time plus the time needed for the diagnostic to time out if it cannot complete. This is always greater than the test execution time.
Table 2: Maximum Runtimes for Individual Tests
Test Group Test Maximum Runtime
Network Interface Card Network Internal Loopback Test 30 minutes
Network External Loopback Test 30 minutes
Memory Memory Diagnostics 63 minutes
Motherboard CPU Test 16 minutes
CPU MCE Test 3 minutes, 10 seconds
Motherboard PCIe Topology Test 1 minute
NVRAM Card NVRAM Card Test 3 minutes
HDD HDD Quick Test 10 minutes
HDD Comprehensive Test 60 minutes
SAS SAS Diagnostics Test 20 or 30 minutes per board
Gateway FC Diagnostic 25 minutes
EMC Proprietary and Confidential - For Internal Use Only
8 DD OS 4.9 Offline Diagnostics Suite User Guide
Connect a Console to the SystemIf your system has a console connected, skip to “Select and Run Diagnostics” on page 12.
Data Domain systems provide the following connectors for attaching a console:
• DIN-type connectors for a PS/2 keyboard and a mouse (except DD880 and DD880g)
• USB-A receptacle port for a USB keyboard
• DB15 female for a VGA monitor
• DB9 male for a serial connection
Figure 1 shows the ways in which you can connect a console or laptop to the system.
Data Domain System
KVM Console
or
Direct Connections
Serial Console/Serveror
Remote Serial Link
Laptop with Terminal Emulation Software
Serial Console/Server
or
Laptop/TerminalEmulation Software
Serial Consoleor
Virtual Console
PS/2 or USB Keyboard with VGA Monitoror
Null modem cable from PC serial port
DB15
PS/2* or USB
VGA
Keyboard
DB9
*Not present on DD880 and DD880g.
DB9
Data Domain System
Figure 1: System Console Connections
DD OS 4.9 Offline Diagnostics Suite User Guide 9
EMC Proprietary and Confidential - For Internal Use Only
PS/2 and KVM KeyboardsNote: DD880 and DD880g systems do not provide a PS/2 port. A USB-to-PS2 converter is needed to use a PS/2 keyboard with these systems.
Make sure that the system is powered off when you connect a PS/2 or KVM keyboard to the system’s DIN-type connector. The PS/2 keyboard does not function at all unless it is plugged in before the system is powered up.
The system confirms the presence of the keyboard during powerup by flashing the Num Lock, Scroll Lock, and Caps Lock LEDs once.
For more information, see the Knowledge Base article, “Connecting a PS/2 Keyboard,” on the Data Domain Support Portal:
https://my.datadomain.com/download/kb/all/Connecting_a_PS-2_Keyboard.html(Support login is required.)
Serial Consoles and LaptopsConfigure the terminal software to use the correct COM port with the following settings:
• Baud rate: 9600
• Bits: 8
• Parity: None
• Stop bits: 1
• Flow control: None
• Terminal emulation type: VT100
Note: The default terminal emulation type for PuTTY is ESC[n~. Change this to VT100+; otherwise, function keys do not work properly.
For more information, see the Knowledge Base article, “Connecting to a Data Domain System with a Serial Cable,” on the Data Domain Support Portal:
https://my.datadomain.com/download/kb/all/Connecting_to_the_Data_Domain_System_with_a_Serial_Cable.html (Support login is required.)
EMC Proprietary and Confidential - For Internal Use Only
10 DD OS 4.9 Offline Diagnostics Suite User Guide
Serial Console Redirection
If you are connecting a serial console or laptop (which use the system’s DB9 port), you may need to change system BIOS settings. Below are the steps to verify or change the BIOS settings. Make sure that there are no backups in progress before proceeding.
Note: In this section, you power up or reboot the Data Domain system. If the system is crashed or hung and cannot be powered up or rebooted from DD OS, refer to the following Knowledge Base articles on the Data Domain Support Portal:
• “New System Will Not Power On or Boot Completely”https://my.datadomain.com/download/kb/rp/New_System_Will_Not_Boot_DOA.html?query=query%3Dwon%27t+boot&fsearch=1&pagenumber=1&size=10&index=0&filterids=194328
• “Data Domain System Will Not Boot and Has Been Working”https://my.datadomain.com/download/kb/rp/Will_Not_Boot_Has_Been_Working.html?query=query%3Dwon%27t+boot&fsearch=1&pagenumber=1&size=10&index=1&filterids=194328
(Support login is required.)
Table 3: Data Domain Default BIOS Passwords
System Password
1. If the system is powered down, press the power button on the front of the system to power it up and skip to step 3.
2. If the system is powered up and there is a system prompt on the console, then:
a. Log in as sysadmin (the Data Domain default password is abc123) and enter
# system reboot
b. Answer yes to the Are you sure? prompt.
3. As the system boots, watch for the BIOS prompt, then press the F2 key repeatedly to enter BIOS mode. Enter the Data Domain default password from Table 3 or else a user-specified password, if you are prompted.
Table 4: BIOS Settings for Serial Console Redirection
System Setting
All systems except DD880 and DD880g
Systems DD880 and DD880g only
DD OS 4.9 Offline Diagnostics Suite User Guide 11
EMC Proprietary and Confidential - For Internal Use Only
4. Edit the system BIOS to enable serial console redirection, as shown in Table 4.
Figure 2 shows the initial BIOS setup screens for serial console redirection.
Figure 2: BIOS Setup Utility
5. Save changes and exit the BIOS to reboot (press F10, if the key is available).
DD140, DD610, DD630, DD670 BIOS > Server > Remote Access Configuration > Redirection after BIOS POST > [Always]
DD880, DD880g BIOS > Server Management > Console Redirection > Legacy OS Redirection > [Enabled]
EMC Proprietary and Confidential - For Internal Use Only
12 DD OS 4.9 Offline Diagnostics Suite User Guide
Select and Run DiagnosticsFigure 3 shows the troubleshooting flow.
Find your problem in the problem list and note the diagnostics specified.
(See “Find the Problem Definition and Its Specified Diagnostics” on page 13.)
Reboot the Data Domain system to offline diagnostics mode.
(See “Reboot the System and Prepare to Run Diagnostics” on page 18.)
Run the diagnostics specified for your problem and check the results.
(See “Run Diagnostics and Check Results” on page 21.)
Perform the recommended service actionfor failed diagnostics and get additional
information from diagnostic logs.
(See “Perform the Recommended Service Actions” on page 26.)
Save logs to the system disk and (optional) USB key, then quit diagnostics and reboot
the system.
(See “Save Logs and Exit Diagnostics” on page 28.)
1)
2)
3)
4)
5)
Figure 3: Diagnostic Troubleshooting Flow
DD OS 4.9 Offline Diagnostics Suite User Guide 13
EMC Proprietary and Confidential - For Internal Use Only
Find the Problem Definition and Its Specified DiagnosticsFigure 4 shows failures identified for supported systems and specifies which diagnostics to run. If there are additional symptoms, such as behavior, messages, or alerts, go to the table indicated to obtain the diagnostics to run. Then go to “Reboot the System and Prepare to Run Diagnostics” on page 18.
Start
Failure: System powerup, bootup, or filesystem startup
Yes
No
No Run: All diagnostics
YesGo to: Table 5 on page 14
Failure: System panics and automatically reboots
Yes
No
Failure:• FRU or slot disabled• FRU absent• FRU or connections faulty
No
Failure: Reduced performance or throughput
Failure: New installation, upgrade, or maintenance issue
Test• HDD Comprehensive Test• SAS or FC (DD880g only, no
VTL) Test
Go to: Table 7 on page 15 to identify additional symptoms and select the appropriate diagnostics
Go to: Table 8 on page 17
Go to: Table 9 on page 17
Run: All diagnosticsNo
Yes
Additional symptoms?
StartNew Installations or Upgrades
Installed Systems
No
Yes
Additional symptoms?
Yes
Yes
Yes
No
Yes
Additional symptoms?
Run: All diagnostics
Run: All diagnostics
No
YesFailure: System is hung
Note: You can access online diagnostics log messages with alerts and information in the bios.txt file using the DD OS log view command. Autosupport (ASUP) reports containing alerts from online diagnostics can be generated and viewed using the autosupport show report command. Refer to the Command Reference Guide.
Figure 4: Failure Identification and Relevant Diagnostics
Table 6: System Panics and Reboots with Additional Symptoms
Console Message/Alerts/Other Tests to Run
Uh-huh. NMI received for unknown reason 20
ALERT: MSG-TOOLS-00004: DRAM uncorrected multi bit error
EMC Proprietary and Confidential - For Internal Use Only
14 DD OS 4.9 Offline Diagnostics Suite User Guide
Table 5: Powerup, Bootup, or Filesystem Problems with Additional Symptoms
Console Message/Alerts/Other Tests to Run
Boot messages during system startup indicate that one or more network port configuration operations failed.
Network Internal Loopback Test
Console displays NVRAM errors during boot.Alert indicates multi-bit uncorrectable errors on NVRAM card after power cycle.Alert indicates that NVRAM card battery is low.Alert indicates that DD OS has disabled NVRAM batteries or that batteries have not fully charged.
NVRAM Card Test
System is unable to boot with expansion shelves connected, but does boot when they are disconnected.
SAS Diagnostics Test
UNKNOWN TOPOLOGY appears on the console at powerup, indicating that there is no communication with a storage device or other IO target. An LED tells you the link status between two points.
• Cable connection is suspected.• FC HBA problem with SFP connector is suspected.• Zone issue (switch) configuration problem is suspected.
FC Diagnostic (DD880g only, no VTL)
Memory slot is disabled.• Example message in bios.txt:…| Slot/Connector #0xe3 | Slot is Disabled | Asserted)
Memory Diagnostics
System panics and automatically reboots.• Single-bit flip is logged in messages.engineering and kern.info.
• MCE error is logged in kern.info.
CPU Test, CPU MCE Test
System panic and reboot is caused by an uncorrectable ECC (POST or runtime) error.
• Console message:
• Typical alert:
DIMM failed self-test during bootup.• Slot fault or disable message is logged in bios.txt during
POST.
Memory Diagnostics
System reboots early in boot cycle. HDD Quick Test
System keeps rebooting and is unstable with a NIC card, but stable without it.
Network External Loopback Test
DD OS 4.9 Offline Diagnostics Suite User Guide 15
EMC Proprietary and Confidential - For Internal Use Only
Table 7: Slot or FRU Disabled, FRU Absent, or Faulty FRU or Connection with Additional Symptoms
Console Message/Alerts/Other Tests to Run
Memory slot is disabled.
• Example message in bios.txt:…| Slot/Connector #0xe3 | Slot is Disabled | Asserted)
DIMM failure is suspected.
Correctable ECC limit is exceeded (runtime error).
• ECC errors are logged in bios.txt.• Typical alerts:ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors Corectable ECC limit exceeded
ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors
Memory Diagnostics
DD OS disables a memory DIMM slot because of excessive correctable errors.
• Typical alert:ALERT: MSG-TOOLS-00005 : DRAM slot disabled due to ECC errors
Faulty memory, CPU, or motherboard is suspected.
Faulty QPI link is suspected.
Memory Diagnostics
CPU Test
CPU MCE Test
System is unable to access a device when it is physically present, device fails to respond to PCIe transactions.
• Incorrect platform topology is suspected.• Faulty connection is suspected.
Motherboard PCIe Topology Test
LUN is present but cannot do IOs (configuration issue).
Bad connection to LUN is suspected.
Bad (direct-attached) cable connection is suspected.
FC Diagnostic (DD880g only, no VTL)
System controller hard disk drive (HDD) is absent. HDD Quick Test
HDD Comprehensive Test
NVRAM card is absent.
NVRAM card batteries are disabled.
NVRAM card battery fault is suspected.
NVRAM card battery connection fault is suspected after visual inspection.
NVRAM batteries have not fully charged.
NVRAM Card Test
EMC Proprietary and Confidential - For Internal Use Only
16 DD OS 4.9 Offline Diagnostics Suite User Guide
DD OS SAS diagnostics command enclosure test topology detected a connectivity problem and further fault isolation is necessary.
Multiple drive failures occurred.
Repeated failures or absences of drives, especially if in the same slot.
Multipath errors on failure.
SAS Diagnostics Test
Statistics output from the DD OS command net config (alias ifconfig) shows abnormally high values for Tx/Rx and error counters. Possible issues are with the system bus or NIC IO interfaces.
Statistics output from the DD OS command ethtool -S ifname shows errors such as DMA underrun, DMA overrun, frame errors, and CRC errors. The diagnostic verifies that the controller is functional.
Note: Contact Data Domain Support for information on using the ethtool -S ifname command.
PCIe reads terminate with aborts, and all Fs are returned by the system, resulting in big values. This could be an indication of problems with the system bus and network device's IO interface. The diagnostic confirms whether there is an issue with either the IO slot or the controller.
The link light on the NIC card does not get turned on after:
• Changing the cable, transceiver, and switch port.
• Issuing the DD OS net config ifname up command, even if the ports are connected to a switch or a peer device.
Network External Loopback Test
Motherboard PCIe Topology Test
Table 7: Slot or FRU Disabled, FRU Absent, or Faulty FRU or Connection with Additional Symptoms (Continued)
Console Message/Alerts/Other Tests to Run
Table 8: Reduced Performance or Throughput with Additional Symptoms
Console Message/Alerts/Other Tests to Run
Table 9: Installation, Upgrade, or Maintenance Issue with Additional Symptoms
Console Message/Alerts/Other Tests to Run
DD OS 4.9 Offline Diagnostics Suite User Guide 17
EMC Proprietary and Confidential - For Internal Use Only
System is slow.
Monitor kern.info for disk-related errors or excessive retries causing sluggish or slow response.
HDD Comprehensive Test
SAS or FC (DD880g only, no VTL) Diagnostics Test
System shows a large, random slowdown of throughput.
Network performance is low. The diagnostic shows whether or not hardware is functional.
CIFS/NFS applications fail on backup or restore, with many resets.
Network External Loopback Test
SAS or FC (DD880g only, no VTL) Diagnostics Test
System shows degraded IO performance.
System shows reduced IO throughput.
Motherboard PCIe Topology Test
Network External Loopback Test
SAS or FC (DD880g only, no VTL) Diagnostics Test
System fails a fresh install and drops into kernel debug mode (kdb). NVRAM Card Test
The system is down and not booting up properly. See Table 5 on page 14.
The system is down and you want to check for hard drive failure. HDD Quick Test
HDD Comprehensive Test
On a new installation, you need to verify that shelves are connected according to the installation plan.
SAS Diagnostics Test
EMC Proprietary and Confidential - For Internal Use Only
18 DD OS 4.9 Offline Diagnostics Suite User Guide
Reboot the System and Prepare to Run DiagnosticsNote: Make sure a console is connected to the Data Domain system, either onsite or via a remote serial link. For more information, see “System Console” on page 6. If the system is crashed or hung and cannot be powered up or rebooted from DD OS, refer to the note on page 10.
1. If the system is powered down, press the power button on the front of the system to power it up and skip to step 3.
2. If the system is powered up and there is a system prompt on the console, stop any backups that are running or wait until those backups are completed, then:
a. Log in as sysadmin (the Data Domain default password is abc123) and enter
# system reboot
b. Answer yes to the Are you sure? prompt.
3. During bootup, the following message prints repeatedly on the console:
Press any key to continue
Within ten seconds, press and hold down the spacebar until the boot menu appears.
Caution: Do not press any other key, as unexpected behavior may result.
4. The boot menu appears.
Use the up arrow and down arrow keys to select the offline diagnostics option for the console interface being used, then press Enter.
DD OS 4.9 Offline Diagnostics Suite User Guide 19
EMC Proprietary and Confidential - For Internal Use Only
5. If this message appears, the automatic model detection function was unable to identify the system model, or does not support it.
Dismissing the screen brings up a list of supported models. Use the up arrow and down arrow keys to select your model, then press Enter.
Caution: If your system is not on the list of supported models, do not select unknown. Contact Data Domain Support.
6. After automatic model detection or after you enter the correct model manually, the Diagnostics Main Menu appears.
EMC Proprietary and Confidential - For Internal Use Only
20 DD OS 4.9 Offline Diagnostics Suite User Guide
Before and After Running Diagnostics
Inserting and Removing a USB Key for Writing Logs
To save log files to a USB key automatically following a diagnostics run, insert the USB key after the Main Menu appears, but before running diagnostics. Remove the USB key before exiting the Main Menu (and rebooting the system). The USB key is unmounted automatically.
Installing Loopback Cables (DD670 Only)
Loopback cables must be installed when you run the Network External Loopback Test. You can use the cables currently installed on the system, or equivalents. Cross-over cables are not necessary. Using the wrong cable will cause the test to fail.
Note: Before removing or changing any cable connections, note or mark the connector and port locations, so that you can easily restore them after running diagnostics.
Loopbacks can be implemented between built-in Ethernet ports, or between ports on the same Ethernet card (NIC) as shown in Figure 5. Refer to the Hardware Overview or Installation and Setup Guide for your system to obtain the location of built-in Ethernet ports and optional NIC slot assignments.
Note: Do not make a loopback from the maintenance port or single-port cards, which are not supported for loopback testing.
Connect the loopback cables as follows.
Port 1
Port 2
Loopback cable
Other NIC copper ports
Port 1 Port 2
Loopback cables for 2 or 4 ports
Port 3 Port 4
Other NIC fiber ports
Port 1 Port 2
Tx Rx Tx Rx
Loopback cables (pair)
Built-in Ethernet,Intel SFP+, Chelsio
• Built-in Ethernet (RJ45), Intel SFP+, or Chelsio: Connect the two ports.
• Other NIC copper ports (RJ45): Connect ports on the same card. (Rx and Tx are determined automatically.)
• Other NIC fiber ports: Each port has Tx and Rx connectors. Connect the Tx side of one port to the Rx side of the other port (on the same card).
Figure 5: Loopback Connections
Remove the loopback connections and restore the previous network cable connections before exiting the Main Menu (and rebooting the system). The presence of loopback cables does not affect running the Network Internal Loopback Test.
Go to “Run Diagnostics and Check Results” on page 21.
DD OS 4.9 Offline Diagnostics Suite User Guide 21
EMC Proprietary and Confidential - For Internal Use Only
Run Diagnostics and Check Results
Navigating the Diagnostics Interface
Figure 6 shows the diagnostics menus and flow.
All testspassed
Boot Offline Diagnostics Suite
Diagnostics Main Menu
1. Run Automatic Diagnostics
2. Run Custom Diagnostics
3. Settings
4. ExitExit diagnostics
and reboot
Please Select Model
Select system model if autodetect is not correct
Diagnostic Selection Menu
Select diagnostics to run from the list of diagnostics available
for your model
Diagnostics Selected to Run
Confirm the diagnostics to run in this session and start test
execution
Diagnostics Run Screen
Appears when diagnostics are running. Reports status and
execution information
One or more tests failed
Diagnostic Recommendations Screen
Displays recommended service actions for all FRUs that failed diagnostics
Diagnostic Log Access Menu
Select logs from passed or failed diagnostics
Log files are written to the system disk and to a
USB key, if inserted.(Log files from previous runs are removed first.)
Test Log
View diagnostic logs (with recommended
service actions)
Exit
ExitExit
Figure 6: Diagnostic Menus and Flow
EMC Proprietary and Confidential - For Internal Use Only
22 DD OS 4.9 Offline Diagnostics Suite User Guide
Special Keys
Table 10 lists special keys you can use in the diagnostics user interface.
Table 10: Special Keys for Diagnostics
Key Use
F1 Displays a help window for the current menu or screen.
H Displays help information for a highlighted item.
Enter Selects a highlighted item.
spacebar Toggles to enable or disable a highlighted item.
up arrow, down arrow
Moves highlighting up or down.
Page Up, Page Down
Scrolls up or down in window (also see b and f keys).
b, f Scrolls backward or forward in a window. If you are using a console application via a remote serial line, use the b and f keys to page up and down. Page Up and Page Down keys may return control to the Main Menu, and you are not able to view logs after that.
Esc Returns to the previous window, or skips to the initial Main Menu.
Ctrl+c Interrupts the diagnostic run. All diagnostics in the session are cancelled and no log files are written. Returns to Diagnostics Main Menu after a few seconds.
Ctrl+Alt+Delete Exits the diagnostics user interface and reboots the system.
DD OS 4.9 Offline Diagnostics Suite User Guide 23
EMC Proprietary and Confidential - For Internal Use Only
Main Menu
The Main Menu appears after you boot offline diagnostics.
Time in offline diagnostics modeSystem model
Main Menu selections are:
• Run Automatic Diagnostics - Run all diagnostics that apply to your system. Test execution starts immediately and the Diagnostics Run screen appears (see page 25).
The following diagnostics have special setup requirements or run times of 1 hour or longer and are enabled for Run Automatic Diagnostics:
• Network External Loopback Test (requires installing loopback cables)• Memory Diagnostics (run time is 63 minutes)• HDD Comprehensive Test (run time is 60 minutes)These tests can be deselected by choosing Run Custom Diagnostics.
Note: If you select Run Automatic Diagnostics, the Network External Loopback Test is enabled by default. Before running automatic diagnostics, you must install loopback cables as described on page 20; otherwise the test will fail. If you do not want to run this test now, choose Run Custom Diagnostics and deselect the test.
• Run Custom Diagnostics - Select a subset of the diagnostics that apply to your system in the Diagnostic Selection Menu (see page 24).
Note: Before running the Network External Loopback Test, you must install loopback cables as described on page 20; otherwise the test fails.
• Settings - Specify your system’s model from a list of supported models if it is not identified correctly at top of the Main Menu (see page 19).
Caution: If your system is not on the list of supported models, do not select unknown. Contact Data Domain Support.
• Exit - Quit offline diagnostics and reboot the system (after confirming the exit).
To reboot the system to normal operation, press any key (except the spacebar) within ten seconds.
Notes:• If you inserted a USB key to store log files, remove it before exiting diagnostics.
• If you installed loopback cables for the Network External Loopback test, remove them and restore the previous network cable connections before exiting diagnostics.
EMC Proprietary and Confidential - For Internal Use Only
24 DD OS 4.9 Offline Diagnostics Suite User Guide
Diagnostic Selection Menu
If you selected Run Custom Diagnostics in the Main Menu, the Diagnostic Selection Menu appears. This menu lists all applicable diagnostics for the system.
Estimated diagnostic runtimes
Diagnostic selected indicator
Estimated total runtime
Continue to the Diagnostics Run screen(page 25)
1. Select or deselect individual diagnostics or groups of diagnostics according to “Find the Problem Definition and Its Specified Diagnostics” on page 13.Highlight the test or group, then press the spacebar to select ([x]) or deselect ([ ]) it.
2. Highlight Run Selected Diagnostics, and press Enter to display the Diagnostics Selected to Run screen and confirm your choices.
3. Select Run in the Diagnostics Selected to Run screen and press Enter to begin test execution (see “Diagnostics Run Screen” on page 25).
Note: The diagnostics user interface displays the maximum possible runtime, including the time for the diagnostic to timeout. This is not the typical time.
DD OS 4.9 Offline Diagnostics Suite User Guide 25
EMC Proprietary and Confidential - For Internal Use Only
Diagnostics Run Screen
The Diagnostics Run screen is active whenever diagnostics are executing.
Estimated total runtime remaining
Results of completed tests
Current test executing and time remaining
All tests passed
One or more testsfailedor
Continue to theDiagnostic Recommendations
screen and diagnostic logs(page 26)
Write logs and returnto the Main Menu
(page 23)
1. If all diagnostics pass, press Enter to automatically write test logs and return to the Main Menu.
2. If one or more diagnostics fail, press Enter to go to the Diagnostic Recommendations screen, which provides suggested service actions and access to diagnostic logs.
Note: Running Memory Diagnostics causes the system to automatically reboot or hang if sufficient memory is not available. This issue is better addressed in DD OS 5.0 offline diagnostics with the availability of the Examine System Inventory feature.
EMC Proprietary and Confidential - For Internal Use Only
26 DD OS 4.9 Offline Diagnostics Suite User Guide
Perform the Recommended Service ActionsIf one or more diagnostics fail, the Diagnostic Recommendations screen displays service actions suggested by the failing diagnostics. Perform all recommended service actions before concluding that the FRU has failed.
Write logs and returnto the Main Menu
(page 23)
Go to the Diagnostic LogAccess Menu
(page 27)
or
Note: When using a console application via a remote serial line, use the b and f keys to page up and down. The Page Up and Page Down keys may return control to the Main Menu, and you will not be able to view logs after that.
Caution: Recommended service actions can include reseating and replacing cards and other components. These activities involve powering down the system and removing system covers for FRU access. Follow the FRU’s installation guide to perform these tasks. These guides are available from the Data Domain’s Support Portal at:
http://my.datadomain.com/US/en/parts.jsp
(Support login is required.)
Using Log Files
Most diagnostics contain several subtests that check different conditions and, upon failure, generate a different recommended service action for each. Recommended service actions appear on the console, but the conditions that generate them are recorded only in the full test log. Before performing the recommended service actions, you can display the full log file to get more information on the failing condition (see “View Log Files” on page 27). You can also view log files after exiting diagnostics (see “Get Log Information After Running Diagnostics” on page 29).
EMC Proprietary and Confidential - For Internal Use Only
View Log FilesNote: When using a console application via a remote serial link, use the b and f keys to page up and down. The Page Up and Page Down keys may return control to the Main Menu, and you will not be able to view logs after that. See Table 10 on page 22 for keys to navigate menus and log files.
Diagnostic Log Access Menu
From the Diagnostic Log Access Menu, you can select and display logs for all tests run. If diagnostics are run more than once, all logs from the previous run are removed before the new logs are written.
Write logs and return to the Main Menu
(page 23)
Go back to the DiagnosticRecommendations screen
(page 26)
Display selectedtest log
Return to the Diagnostic Log Access Menu
or or
EMC Proprietary and Confidential - For Internal Use Only
28 DD OS 4.9 Offline Diagnostics Suite User Guide
For example log file listings with annotations, see:
• “Example diag_log.sub File” on page 30
• “Example USB Key Log File” on page 32
Save Logs and Exit DiagnosticsSaving Test Log Files to the System Controller Disk and a USB KeyExiting from the following windows results in log files being written to the system boot disk and to an external USB key (is one is inserted):
• Diagnostics Run screen with All tests PASSED (select OK, then press Enter)
• Diagnostic Recommendations screen (press the Esc key or the Q key)
• Diagnostic Log Access Menu (select Return to Main Menu, then press Enter)
Status screens appear while the required drivers are loaded and logs are written, then control is returned to the Main Menu.
Note: If a core dump occurs while you are running offline diagnostics, the core file is written to the SUB area for analysis by Data Domain Support. Core files are not saved to USB keys.
Saving Logs to a USB Key OnlyLogs cannot be saved to the system controller boot disk if:
• Disk hardware is not functional.
• The disk has been swapped out (serial number mismatch with system).
• DD OS 4.9 is not installed.
If logs cannot be saved to disk and no USB key is present, you are prompted to insert a USB key. You can use any FAT32 (Unix VFAT) formatted USB key with at least 10 MB of free space. If you do not insert a USB key, no diagnostic logs are saved.
Note the following messages:• No USB key present indicates that the inserted key is defective or not formatted.
Verify formatting and try a different port or a different USB key.
• Could not write log files to USB key indicates that the key has less than 10 MB of free space.
After logs are successfully saved to the USB key, you are returned to the Main Menu.
Remove the USB key before exiting the Main Menu (and rebooting the system).
DD OS 4.9 Offline Diagnostics Suite User Guide 29
EMC Proprietary and Confidential - For Internal Use Only
Get Log Information After Running DiagnosticsAfter running diagnostics, ASCII-format logs are:
• Concatenated and written to a single diag_log.sub file on the system disk
• Written as separate files to a USB key (if one is inserted)
Log File on the System DiskIf you are able to reboot the system to online mode, you can get test logs for the last diagnostics run. These are concatenated into single file on the system boot disk:
/ddr/var/log/debug/platform/diag_log.sub
Viewing Logs on the System Disk
Offline diagnostics logs are saved to the system boot disk and are readable using the DD OS log view command:
# log view debug/platform/diag_log.sub
EMC Proprietary and Confidential - For Internal Use Only
30 DD OS 4.9 Offline Diagnostics Suite User Guide
Example diag_log.sub File
Figure 7 shows a diag_log.sub file containing an NVRAM test log and a framework log that lists the sequence of diagnostic tests and indicates which tests were run. Because all diagnostic logs are concatenated into a single file, the left column identifies the test log and the right column contains log information.
File created at Mon Aug 22 15:30:16 PDT 2011...[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [START] Timestamp: Mon Aug 22 22:19:07 2011[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Diagnostic STDOUT and STDERR log for 'NVRAM Card Test' with switches '-e'[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] (DDOS Version: 4.9.3.1-245662) (pid: 10603) (sequence number: 2) (iteration: 1) (list element: 1)[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] ******************************[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Testing /dev/umema[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] ******************************[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing ECC and Battery test on /dev/umema partition 2.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing PRE read ECC[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [STDERR]67521+0 records in[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] 67521+0 records out[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 read only test.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing POST read ECC check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing battery check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Size: 1.024 MBytes[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Number of batteries: 3[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [0]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [1]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [2]: Status OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge in milli-Volts: 4099[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Charge percent: 100.09[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema2 battery check 3 batteries OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] PASSED ECC and Battery test on /dev/umema partition 2.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing ECC and Battery test on /dev/umema partition 3.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing PRE read ECC[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema3 ECC check OK[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] [STDERR]2025472+0 records in[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] 2025472+0 records out[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Passed /dev/umema3 read only test.[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Performing POST read ECC check[/tmp/ddod/ddod_NVRAM_Card_Test_tr.log] Failed /dev/umema3 ECC check. PCI errors 0 memory errors 1 detected.
Name of diagnostic log Log information
Note: Timestamps in the diag_log.sub file use the UTC time standard that is independent of time zone and daylight savings time offsets.
Figure 7: NVRAM Test and Framework Logs in diag_log.sub File (1 of 2)
DD OS 4.9 Offline Diagnostics Suite User Guide 31
EMC Proprietary and Confidential - For Internal Use Only
Figure 7: NVRAM Test and Framework Logs in diag_log.sub File (2 of 2)
EMC Proprietary and Confidential - For Internal Use Only
32 DD OS 4.9 Offline Diagnostics Suite User Guide
Log Files on a USB Key
Viewing Logs on a USB Key
Log files are written to a USB key, if inserted, when you return to the Main Menu after running diagnostics or when you select Save Diagnostic Logs to USB Key on the Main Menu. These logs are displayed when running Offline Diagnostics.
The parent log directory /diag_logs is created off the USB root and a subdirectory /<log-mm-dd-hh-mm> is created (where mm = month, dd = day, hh = hour, and mm = minute the logs were saved). Individual diagnostic logs and a log of the diagnostic flow are saved in this subdirectory. These logs are saved in ASCII format for viewing on any Linux or Windows machine.
Note: Timestamps in USB log files use the UTC time standard that is independent of time zone and daylight savings time offsets.
Example USB Key Log File
Figure 8 shows subtest results and recommended service actions for an NVRAM card.
...Performing ECC and Battery test on /dev/umemb partition entire device. Performing PRE read ECC Passed /dev/umemb ECC check OK[STDERR]2097152+0 records in2097152+0 records out Passed /dev/umemb read only test. Performing POST read ECC check Passed /dev/umemb ECC check OK
Performing battery checkSize: 1.024 MBytes1
Number of batteries: 3 [0]: Status DISCHARGED Charge in milli-Volts: 3863 Charge percent: 0.03 [1]: Status OK Charge in milli-Volts: 4150 Charge percent: 100.00 [2]: Status OK Charge in milli-Volts: 4150 Charge percent: 100.00 Failed /dev/umemb battery check 3 batteries found, 2 were OK. RECOMMENDATIONS_START Leave DDR as is in this Offline Diagnostic mode for at least 3 hours to allow battery to charge. Then reboot the system and re-run this diagnostic. If this failure persists then replace the NVRAM card. RECOMMENDATIONS_END...
1. The correct value is 1.024 GBytes.
Recommended service actions
Battery check subtest status
PRE read ECC subtest status
POST read ECC subtest status
Figure 8: NVRAM Test Log File on a USB Key
DD OS 4.9 Offline Diagnostics Suite User Guide 33
EMC Proprietary and Confidential - For Internal Use Only
Diagnostic Test DescriptionsTable 11 provides additional information on diagnostic test coverage.
Table 11: Offline Diagnostic Test Coverage
Test Name Coverage
CPU MCE Test Decodes and prints the stored machine check record generated by a machine check event.
• Most errors can be corrected by the CPU using internal error correction mechanisms. Uncorrected errors cause machine check exceptions that may panic the system.
• The MCE error condition displayed by the test unambiguously identifies the faulty component.
CPU Test Tests the processor’s ability to perform a Compress–Uncompress–Compare operation sequence.
• An MD5 fingerprint is generated for the data file before it is compressed. The compressed file is then uncompressed and its MD5 fingerprint is compared against the one generated for the original file.
FC Diagnostic Tests the Fibre Channel (FC) devices attached to FC cards.
Note: This applies to DD880g only; VTL is not supported.
• Verifies that Logical Unit Numbers (LUNs) are detected and configured correctly.
• Performs IOs to test the physical interface (path to storage).
• Directs LUNs to perform a read IO, then reads from the LUNs.
HDD Comprehensive Test Tests all system controller disks.
• Reads disk sectors and their SMART data.
HDD Quick Test Tests the system controller boot disk only.
• Reads disk sectors and their SMART data.
Memory Diagnostics Tests available free memory and reports any ECC errors (correctable and uncorrectable) detected by hardware.
• Identifies and reports and reports a failing DIMM or DIMM pair.
Motherboard PCIe Topology Test Checks that the detected PCIe IO topology matches what the system was designed for.
• PCIe interconnect tests exercise and verify the PCIe subsystem. They ensure the connectivity for all the PCIe-based controllers and other IO targets present on the motherboard.
• The tests scan the entire PCIe fabric, starting from the PCIe root and looking for expected PCIe topology. The tests indicate errors if an expected device or set of devices are absent.
EMC Proprietary and Confidential - For Internal Use Only
34 DD OS 4.9 Offline Diagnostics Suite User Guide
Network External Loopback Test Tests if the network controller’s data path is functional through the NIC Tx and Rx ports. Built-in Ethernet and dual- and quad-port NICS can be tested; single-port NICs cannot be tested.
This test generates and sends out packets, and expects to receive the same number of packets.
Note: Before you run the Network External Loopback Test, loopback cables must be installed. For instructions, see “Installing Loopback Cables (DD670 Only)” on page 20. Installing loopback cables for the Network External Loopback Test does not affect running the Network Internal Loopback Test.
Network Internal Loopback Test Tests if the network controller’s data path is functional to the MAC layer. Loopback is through the internal loopback interface (MAC layer); packets do not leave the controller.
This test generates and sends out packets and expects to receive the same number of packets.
NVRAM Card Test Performs the following tests and checks:
• Tests all partitions of NV memory on one or multiple NVRAM cards. Scans through the entire range to NV memory in each partition.
• Memory ECC.
• Battery status.
• Battery state (enabled/disabled).
SAS Diagnostics Test Tests the SAS topology to determine the reliability of the connections.
• Pinpoints SAS connectivity problems to specific links.
• Tests each external SAS Host Bus Adapter (HBA) port and attached shelf chain.
• Generates read traffic to find errors that only occur under load.
Table 11: Offline Diagnostic Test Coverage (Continued)