© 2006 IBM Corporation
This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers.IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM
Corporation
System x Basic Troubleshooting
XTW01
Topic 11
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
22
Course Objectives
At the completion of this topic, you should be able to:
> Identify basic troubleshooting questions to consider
> Identify the six possible states of a system
> Identify diagnostic tools that are available to gather and analyze information for each given system state
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
33
> * IBM System x Troubleshooting Questions *
> Six System States
> Data Gathering Diagnostic Tools
Light Path Diagnostic
BMC, RSA and AMM
Dynamic System Analysis (DSA)
Topic 11- Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
44
When working with problems on the System x servers, consider asking the following questions:
> Will the system power up?
> Did it ever power up?
> Is there a POST error message?
> If yes, what is it?
> Does the NOS load?
> Are any error lights illuminated?
> Is the BMC configured for remote access?
> Is the RSA-II and AMM installed?
> The log can be captured for analysis?
Questions To Ask
Troubleshooting IBM System x Servers
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
55
> IBM System x Troubleshooting Questions
> * Six System States *
> Data Gathering Diagnostic Tools
Light Path Diagnostic
BMC, RSA and AMM
Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
66
AC
AC/DC
POST
NOS
Start
Complete
Stop
System state #1 – There is no AC power
System state #2 - There is AC power but there is no DC output
System state #3 – There is both AC and DC power but
the system fails to complete POST
System state #4 – There is both AC and DC power, the system
completes POST but the NOS fails to start loading
System state #5 – There is both AC and DC power, the system completes POST but the NOS fails to complete loading
System state #6 – There is both AC and DC power, the system completes POST and the NOS completes loading but stops during operation
> Identifying the Six System States
IBM System x – Six States PD
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
77
Information Gathering and Analysis Tools
Information Gathering:
> Eyes and ears> HMM and PDSG> Light Path diagnostics> BMC> RSA > Boot sequence options
F1 setup, F2 diagnostics Adapter BIOS messages
> NOS start-up messages> NOS failure messages> Dynamic System Analysis> NOS event logs
Information Analysis:
> HMM and PDSG> Light Path diagnostics> BIOS messages
Checkpoint codes Adapter BIOS warnings
> SVCCon, SMBridge, F1 setup and F2 diagnostics Access BMC event logs
> Web browser Access RSA event logs
> RETAIN tips> IBM Support Web site> DSA
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
88
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
State 1 - No AC Power
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
99
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
State 2 - AC Power But No DC Output
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1010
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but the system fails to complete POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec, LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
State 3 - System Fails To Complete POST
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1111
State 4 - System Completes POST But NOS Fails To Start Loading
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but the system fails to complete POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec, LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the system completes POST but the NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
F2 diagnostics
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1212
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but the system fails to complete POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec, LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the system completes POST but the NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
5. There is AC and DC power, the system completes POST but the NOS fails to complete loading
NOS boot messages
‘Blue screen’
‘Safe’ mode
NOS vendor messages
State 5 - NOS Fails To Complete Loading
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1313
System State Data Gathering Data Analysis
1. There is no AC power Visual PDSG/HMM
2. There is AC power but no DC output
BMC
RSA and AMM
Light path
SvcCon, SMBridge
RSA and AMM event log
3. There is AC and DC power but the system fails to complete POST
Checkpoint codes
F1 and F2
Beep codes
Adapter BIOS msgs (Adaptec, LSI, etc.)
PDSG
RETAIN tips
IBM support Web site
4. There is AC and DC power, the system completes POST but the NOS fails to start loading
ServeRAID Manager
F2 diagnostics
PDSG
RETAIN tips
5. There is AC and DC power, the system completes POST but the NOS fails to complete loading
NOS boot messages
‘Blue screen’
‘Safe’ mode
NOS vendor messages
6. There is AC and DC power, the system completes POST and the NOS completes loading but stops during operation
DSA
NOS event logs
DSA
State 6 - NOS Loads But Stops During Normal Operations
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1414
Gathering Information - Tip
If multiple sources are available, look for confirmations
> Two sources pointing at the same probable cause increases confidence in the information
> Two sources pointing at different probable causes reduces confidence in the information Search for a third source to clarify the information being presented
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1515
Analyzing Information - Tip
Formal reference points are proven
> RETAIN tips are based on factual evidence from previous cases histories
> The PDSG is based on the collective knowledge of the system designers and senior support teams
Guessing is NOT an option
> If the information is unclear, seek help
Experience is very valuable
> Consult with team members if you are unsure of what the information is telling you
> Offer guidance to less experienced co-workers
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1616
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
* Light Path Diagnostic *
BMC, RSA and AMM
Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1717
Light Path Diagnostics
> Allows quick diagnosis of any type of server error Introduced in 1998, now included in most
System x, BladeCenter, and Blade Servers
> Level 1 – Drop-down panel containing system status LEDs LEDs that correspond to major server
components Includes Remind and Reset buttons
> Level 2 – LED identifying suspect component LEDs placed throughout server next to
individual server components Even without power to server, can be used
for up to 12 hours
Pop out Operator Information Panel
Blade server Front Panel LEDs
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1818
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
Light Path Diagnostic
* BMC, RSA and AMM *
Dynamic System Analysis (DSA)
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
1919
IBM Systems Management Hardware Portfolio
Mini-BMC BMC
RemoteSupervisor
Adapter
AdvancedManagement
Module
Mini Baseboard ManagementController• IPMI 1.5 compliant• Monitor voltages, temps, battery• Drive system LED’s except LightPath• Power control, system reset, and
reboot• Used in value servers
Baseboard ManagementController• Same features as mini-BMC plus
the following:• IPMI 1.5 or 2.0 compliant,
depending on system• Serial over LAN (SOL)• Drives LightPath• On all but value servers
Remote Supervisor Adapter• Web interface and full SSL and
other security module integrations
• LDAP integration for authentication
• Remote KVM support• Remote disk support• DNS, DHCP, SNMP, SLP• Standard in select servers and
optional for most other servers in portfolio
BladeCenter Adv Mgt Module• Web interface and full SSL and
other security module integrations• LDAP integration for authentication• Remote KVM support• Remote disk support• DNS, DHCP, SNMP , SLP• USB Virtualization
• With concurrent capable blade• Concurrent KVM capable• Concurrent Remote Drive capable• Concurrent Media Tray capable
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2020
> IBM System x Troubleshooting Questions
> Six System States
> Data Gathering Diagnostic Tools
Light Path Diagnostic
BMC, RSA and AMM
* Dynamic System Analysis (DSA) *
Topic 11 - Course Agenda
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2121
Product download page:
http://www.ibm.com/systems/management/dsa.html
Dynamic System Analysis
DSA collects and analyzes information about various aspects of a system to aid in troubleshooting
Creates a merged log with all the retrieved information
> Compressed XML file for IBM Support personnel
> Optionally, HTML pages can be created for all users
Portable Edition> Runs without altering target system> Removes any created temporary files
Installable Edition> Permanent> Integrates with UpdateXpress input to
rapidly identify down-level firmware and drivers
Analysed components:> System configuration > Installed applications and hot fixes> Device drivers and system services> Network interfaces and settings> Performance data and details for
running processes> Hardware inventory, including PCI
information> Vital product data, firmware, and
basic input/output system (BIOS) information
> SCSI device sense data> EXA chipset uncorrectable error
register information> ServeRAID configuration> Event logs for the operating system,
applications, security, ServeRAID controllers, and service processors
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2222
Dynamic System Analysis - Portable Edition
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2323
Dynamic System Analysis - Installable Edition
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2424
> Provide problem isolation, configuration analysis, error log collection
> Primary method of testing the major components
> Viewed locally or uploaded to an internal FTP server
> Standard for System x and BladeCenter servers
New Preboot Dynamic System Analysis
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2525
> Press F2 key during POST
> By default, it takes you to the IBM Memory Test
Select Quit to exit to DSA
> Can take up to 10 minutes to load
> Power on all attached devices before powering on the server Preboot DSA memory tests
Preboot DSA - Access
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2626
> Preboot DSA offers several options in a command line menu system
> IBM DSA Interactive Several command line
instructions are available
Preboot DSA - Command Line
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2727
Selecting ‘Diagnostics’ from the main menu will load the diagnostic tests page
Preboot DSA - Graphical Diagnostics
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2828
Preboot DSA - Graphical Interface
Select System Information GUI to enter the Graphical User Menu
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
2929
Problem Determination - Information Gathering
> Machine type and model
> Microprocessor or hard disk upgrades
> Failure symptom Do diagnostics fail? What, when, where, single, or multiple systems? Is the failure repeatable? Has this configuration ever worked? If it has been working, what changes were made prior to it failing? Is this the original reported failure?
> Diagnostics version — type and version level
> Hardware configuration Print (print screen) configuration currently in use BIOS level
> Operating system software — type and version level
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
3030
> When solving problems – especially ones that involve a component replacement, ensure the following:> Apply code updates to ensure that all code across all boards is matched for
levels and will provide a working system> Run the embedded diagnostics program to test the new component> Run a “quick test” on the entire system> Clear the BMC event log in readiness for any subsequent events
> The embedded diagnostics programs are the primary method of testing the major components of the server following parts replacement
> Event logs are limited in capacity Once a problem has been resolved, clear
the logs so that useful information can be captured, should another fault occur
When Solving Problems
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
3131
Advanced Management Module (AMM)
Baseboard Management Controller (BMC)
Common Information Model (CIM)
Dynamic System Analysis (DSA)
Intelligent Platform Management Interface
(IPMI)
Light Path Diagnostic
Multiple processing (MP)
Problem Determination and Service Guide
(PDSG)
Remote Supervisor Adapter (RSA) II
Glossary of terms
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
3232
Course Summary
Having completed this topic, you should be able to:
> Identify basic troubleshooting questions to consider
> Identify the six possible states of a system
> Identify diagnostic tools that are available to gather and analyze information for each given system state
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
3333
Additional Resources
IBM STG SMART Zone for more education on Webinar, Web Lectures, etc..:
> Internal: http://lt.be.ibm.com/smartzone/modulartechnical
> BP: http://www.ibm.com/services/weblectures/dlv/partnerworld
IBM System x
> http://www-03.ibm.com/systems/x/
IBM BladeCenter Chassis
> http://www-03.ibm.com/systems/bladecenter/
IBM BladeCenter Blade Servers
> http://www-03.ibm.com/systems/bladecenter/hardware/servers/index.html
IBM BladeCenter Redbooks
> http://www.redbooks.ibm.com/
IBM ServerProven
> http://www-03.ibm.com/servers/eserver/serverproven/compat/us/
IBM System x Support
> http://www-304.ibm.com/systems/support/supportsite.wss/brandmain?brandind=5000008
IBM Systems & Technology Group Education & Sales Enablement © 2008 IBM Corporation
3434
End of Presentation