Top Banner
1 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback. Abstract This guide helps you troubleshoot OneFS upgrade failures and error messages received during upgrades. September 15, 2015 EMC ISILON CUSTOMER TROUBLESHOOTING GUIDE ONEFS UPGRADE FAILURES
22

Cust Troubleshoot

Apr 14, 2016

Download

Documents

arungarg_it

isilon troubleshoot
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cust Troubleshoot

1 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Abstract

This guide helps you troubleshoot OneFS upgrade failures and error

messages received during upgrades.

September 15, 2015

EMC ISILON CUSTOMER TROUBLESHOOTING GUIDE

ONEFS UPGRADE FAILURES

Page 2: Cust Troubleshoot

2 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Best practices and useful information

Page 4

Contents and overview

Before you begin

Page 3

Appendix A

If you need further assistance

Start Troubleshooting

Page 5

Nodes did not all come back online

Page 8

Simultaneous Upgrade

Page 11

Rolling Upgrade

Page 12

Note Follow all of these steps, in order, until you reach a resolution.

1. Follow these

steps.

2. Perform

troubleshooting

steps in order.

3. Appendices

Appendix B

How to use this flow chart

Page 3: Cust Troubleshoot

3 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Configure logging through SSH

We recommend configuring screen logging to log all session input and output during your troubleshooting session . This log

file can be shared with EMC Isilon Technical Support if you require assistance at any point during troubleshooting .

Note: The screen session capability does not work in OneFS 7.1.0.6 and 7.1.1.2. If you are running either of these versions,

please configure logging using your local SSH client's logging feature.

1. Open an SSH connection to the cluster and log in using the root account. Note: If the cluster is in compliance mode, use

the compadmin account to log in. All compadmin commands must be preceded by the sudo prefix.

2. Change the directory to /ifs/data/Isilon_Support by running:

cd /ifs/data/Isilon_Support

3. Run the following command to capture all input and output of the session:

screen -L

This will create a file called screenlog.0 that will be appended to during your session.

4. Perform troubleshooting.

Before you begin

CAUTION!If the node, subnet, or pool you are working on goes down during the course of

troubleshooting and you do not have any other way to connect to the cluster, you could

experience data unavailability.

Therefore, make sure you have more than one way to connect to the cluster before you

start this troubleshooting process. The best method is to have a serial cable available.

That way, if you are unable to connect through the network, you will still be able to

connect to the cluster physically.

For specific requirements and instructions for making a physical connection to the

cluster, see article 16744 on the EMC Online Support site.

Before you begin troubleshooting, confirm that you can either connect through another

subnet or pool, or that you have physical access to the cluster.

Page 4: Cust Troubleshoot

4 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Most upgrade problems occur during rolling upgrades that are initiated

from the OneFS web administration interface.

For best results, do the following:

Use the command-line interface (CLI) to perform upgrades.

Initiate the upgrade from the highest-numbered node in the cluster,

unless the highest-numbered node is an Accelerator.

If the highest-numbered node is an Accelerator, then initiate the

upgrade from node 1.

Use the command-line interface

It is best to initiate the upgrade from the command-line interface. The

CLI displays more detailed information than the web interface, and is not

reliant on the WebUI services running in order to function. You can also

launch a screen session, which enables you to resume from where you

left off if you get disconnected.

Initiate the upgrade from the highest-numbered node

The node that you initiate the upgrade from is called the "master node."

During an upgrade, each node is upgraded and rebooted in turn, in

ascending numerical order, starting with the lowest-numbered node.

When the master node is the highest-numbered node, the upgrade

starts with node 1, and the last node to be rebooted is the master node.

The system should always upgrade and reboot the master node last,

regardless of which numbered node it is, but this does not always

happen. Sometimes, when the master node is not the highest-numbered

node, the system starts upgrading with node 1 as usual, but when it

reaches the master node, it upgrades and reboots that node in its

numerical order. This stops the upgrade process because, after it is

rebooted, the master node can no longer tell the rest of the nodes to

upgrade. Therefore, you should always initiate the upgrade from the

highest-numbered node in the cluster (unless, as stated above, the

highest-numbered node is an Accelerator; in this case, you should

initiate the upgrade from node 1).

Best practices and useful information

Introduction This page explains why upgrades often fail

and how to prevent upgrade problems in the

future.

Page 5: Cust Troubleshoot

5 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting

Analysis

Start

Note Most upgrade problems

occur during rolling

upgrades that are initiated

from the OneFS web

administration interface.

Therefore, we will use the

command-line interface

exclusively to troubleshoot

your issue and get your

upgrade restarted. For

more information, see

"Best practices and useful

information" on page 4.

Did the

upgrade fail with a

specific error displayed

on the screen?

Follow the prompts

and onscreen

instructions.

Yes

Can the

upgrade be completed

successfully now?No

End troubleshooting

Yes

No

IntroductionStart troubleshooting here. If you need

help understanding the flow chart

conventions used in this guide, see

Appendix B: How to use this flow chart.

Go to Page 6

Go to Page 6

If you have not done so already, log in to

the cluster and configure logging through

SSH, as described on page 3.

Page 6: Cust Troubleshoot

6 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued

Analysis, continued

Page

6

You could have arrived here from:

Page 5 - Analysis

Page 8 - Nodes did not all come back online

After

running the command,

do you see this error?

ERROR Client connected from

an unprivileged port number

50230. Refusing the connection

[Errno 54] RPC session

disconnected

No

Install a patch as described in

the following article:

OneFS: After a failed or paused

upgrade, commands sent from

nodes that are not yet upgraded

might fail, article 198906.

Then continue troubleshooting.

Yes

Go to Page 7

Run the following command to see which nodes were successfully upgraded.

isi_for_array -s "uname -a"

The output provides a list of all the nodes and indicates which version of

OneFS each is running. For an example of the output, see Appendix C.

Note: If the node did not fully reboot or is down, it will not show up. Also, if

the upgrade was a rolling upgrade, an error might appear stating a node did

not come back online.

_________

____________________________________

______________

Page 7: Cust Troubleshoot

7 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Troubleshooting, continued

Analysis, continued

Page

7

Using the

output of the

isi_for_array -s "uname -a"

command from Page 6,

are all the nodes running

the new version

of OneFS?

No

Yes

No

Yes

You could have arrived here from:

Page 6 - Analysis, continued

Go to Page 8

Go to Page 9

End troubleshooting

Run the following command:

isi status -q

In the output, look at the Health DASR column to see if

any nodes report -D- (Down). For an example of the

output, see Appendix D.__________

Do any

nodes report as down?

A down node means that it

failed to join the cluster

following the

upgrade.

Page 8: Cust Troubleshoot

8 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 7 - Analysis, continued

Troubleshooting, continued

Nodes did not all come back online

Has it been

at least 15 minutes since the

nodes rebooted as part of

the upgrade?

Yes

Wait 15 minutes

Go back to Page 6

No

Page

8

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 9: Cust Troubleshoot

9 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 7 - Analysis, continued

No

Go to Page 10

Yes

Did you

follow the steps in the

"Planning an Upgrade" and

"Completing pre-upgrade tasks"

sections of the OneFS Upgrade

Planning and Process Guide

before beginning the

upgrade?

Follow the steps in the "Planning an

Upgrade" and the "Completing pre-

upgrade tasks" sections of the

OneFS Upgrade Planning and

Process Guide.

Troubleshooting, continued

Analysis, continued

Page

9

Page 10: Cust Troubleshoot

10 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Page

10

Did you

perform a simultaneous

upgrade or a rolling

upgrade?

RollingSimultaneous

Go to Page 11 Go to Page 12

Troubleshooting, continued

Analysis, continued

You could have arrived here from:

Page 9 - Analysis, continued

Page 11: Cust Troubleshoot

11 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Page

11

Do any

nodes report as down?

A down node means that it

failed to join the cluster

following the

upgrade.

You could have arrived here from:

Page 10 - Analysis, continued

No

Yes

Troubleshooting, continued

Simultaneous upgrade

Go to Page 14

Run the following command:

isi status -q

In the output, look at the Health DASR column to see if

any nodes report -D- (Down). For an example of the

output, see Appendix D.__________

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 12: Cust Troubleshoot

12 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 10 - Analysis, continued

Page

12

Troubleshooting, continued

Rolling upgrade

For each node that did not get upgraded, run the following command to check

that node's /var/log/messages file to see if there are errors with a timestamp

that occurred during the upgrade. In the command, replace <YYYY-MM-DD>

with the date of the upgrade:

grep '^<YYYY-MM-DD>' /var/log/update_engine*

For example:

grep '^2015-04-15' /var/log/update_engine*

Yes

Are there

errors on a node that did

not get upgraded? No Go to Page 14

Is the

following error present?

Unable to claim upgrade

daemon on one or

more nodes.

No

Yes

Go to Page 13

________________________

Go to Page 14

Run the following command to determine which nodes did not get upgraded:

isi_for_array -s "uname -a"

The output provides a list of all the nodes and indicates which version of

OneFS each is running. For an example of the output, see Appendix C. _________

Page 13: Cust Troubleshoot

13 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 12 - Rolling upgrade, continued

Page

13

Run the following command:

isi services -a isi_upgrade_d

Is the

service enabled or

disabled?

Disabled

You are still in the middle of an upgrade and unable

to proceed.

Disable the service by running the following

command:

isi services -a isi_upgrade_d disable

Go to Page 14

Enabled

Troubleshooting, continued

Rolling upgrade, continued

_____________________________

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 14: Cust Troubleshoot

14 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 11 - Simultaneous upgrade

Page 12 - Rolling upgrade, continued

Page 13 - Rolling upgrade, continued

Page

14

Open a screen session by running the following command, where <session name> is a name that

you provide. Record the name in case you need to use it later. The screen session enables you to

easily reconnect to the upgrade process if the session gets disconnected during the upgrade.

screen -S <session name>

If you get disconnected, you can use the following command to reconnect:

screen -x <session name>

Note: If you are running OneFS 7.1.1.2 or 7.1.0.6, skip this step. The screen session feature

does not work in OneFS 7.1.1.2 or 7.1.0.6.

Troubleshooting, continued

Restart the upgrade

Open an SSH connection to the

highest-numbered node in the cluster,

and log in using the root account.

____________________________________________________________

Restart the upgrade by running one of

the following commands:

For a rolling upgrade:

isi update --rolling

For a simultaneous upgrade:

isi update

Did the

upgrade

restart?No

Go to Page 15

Wait for the upgrade

to complete.

Yes

__________________________

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 15: Cust Troubleshoot

15 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 14 - Restart the upgrade, continued

Page

15

Have all

of the nodes been

upgraded? NoYes

Troubleshooting, continued

Restart the upgrade, continued

Go to Page 16

Run the following command to determine whether

any more nodes were upgraded:

isi_for_array -s "uname -a"

The output provides a list of all the nodes and indicates which version of

OneFS each is running. For an example of the output, see Appendix C. _________

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 16: Cust Troubleshoot

16 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

You could have arrived here from:

Page 15 - Restart the upgrade, continued

Page

16

NoYes

Go to Page 17

Troubleshooting, continued

Post-upgrade checks

__________________________________

End troubleshooting

Do any

nodes report as down?

A down node means that it

failed to join the cluster

following the

upgrade.

Run the following command:

isi status -q

In the output, look at the Health DASR column to see if

any nodes report -D- (Down). For an example of the

output, see Appendix D.__________

Page 17: Cust Troubleshoot

17 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Page

17

Yes No

Troubleshooting, continued

Nodes did not all join the cluster

End troubleshooting

You could have arrived here from:

Page 16 - Post-upgrade checks

Reboot each down node as follows:

1. If possible, use a serial console to connect to the node.

Otherwise, log in to the node by using SSH.

For instructions about connecting through a serial console, see

article 16744 on the EMC Online Support site.

2. After you are connected to the node, run the following command

to reboot the node:

shutdown -r now

3. Wait for the rebooted nodes to come back online.

___________

Do any

nodes report as down?

A down node means that it

failed to join the cluster

following the

reboot.

Run the following command:

isi status -q

In the output, look at the Health DASR column to see if any nodes

report -D- (Down). For an example of the output, see Appendix D.__________

If you want to determine root cause, please contact Isilon Technical

Support before continuing. If you do not want root cause analysis,

then continue.

Note the page number that you

are currently on.

Upload log files and contact Isilon Technical

Support, as instructed in Appendix A.

Page 18: Cust Troubleshoot

18 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Contact EMC Isilon Technical Support

If you need to contact Isilon Technical Support during troubleshooting, reference the page or step that you need help on.

This information and the log file will help Isilon Technical Support staff resolve your case more quickly .

Appendix A: If you need further assistance

Upload node log files and the screen log file to EMC Isilon Technical Support

1. When troubleshooting is complete, type exit to end your screen session.

2. Gather and upload the node log set and include the SSH screen log file by using the command appropriate for your

method of uploading files. If you are not sure which method to use, then use FTP.

ESRS:

isi_gather_info --esrs --local-only -f /ifs/data/Isilon_Support/screenlog.0

FTP:

isi_gather_info --ftp --local-only -f /ifs/data/Isilon_Support/screenlog.0

HTTP:

isi_gather_info --http --local-only -f /ifs/data/Isilon_Support/screenlog.0

SMTP:

isi_gather_info --email --local-only -f /ifs/data/Isilon_Support/screenlog.0

SupportIQ:

Copy and paste the following command.

Note: When you copy and paste the command into the command-line interface, it will appear on multiple lines (exactly

as it appears on the page), but when you press Enter the command will run as it should.

isi_gather_info --local-only -f /ifs/data/Isilon_Support/screenlog.0 --noupload \

--symlink /var/crash/SupportIQ/upload/ftp

3. If you receive a message that the upload was unsuccessful, refer to article 16759 on the EMC Online Support site for

directions for uploading files over FTP.___________

Page 19: Cust Troubleshoot

19 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Decision diamondYes No

Process stepProcess step with command:

command xyz

Go to Page #

Page

#

Note Provides context and additional

information. Sometimes a note is

linked to a process step with a

colored dot.

CAUTION!Caution boxes warn that

a particular step needs

to be performed with

great care, to prevent

serious consequences.

End point Document ShapeCalls out supporting documentation

for a process step. When possible,

these shapes contain links to the

reference document.

Sometimes linked to a process step

with a colored dot.

Optional process step

Directional arrows indicate

the path through the

process flow.

IntroductionDescribes what the section helps you to

accomplish.

You could have arrived here from:

Page # - Page title

Appendix B: How to use this flow chart

Page 20: Cust Troubleshoot

20 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Appendix C: Output of the isi_for_array -s "uname -a"

command

You could have arrived here from:

Page 6 - Analysis, continued

Page 12 - Rolling upgrade, continued

Page 15 - Restart the upgrade, continued

Example output for

isi_for_array -s "uname -a"

cluster-1: Isilon OneFS cluster-1 v7.0.2.5 Isilon OneFS v7.0.2.5

B_7_0_2_216(RELEASE): 0x7000250005000D8:Mon Nov 25 20:16:16 PST 2013

[email protected]:/build/mnt/obj.RELEASE/build/mnt/src/sys/

IQ.amd64.release amd64

_______________________

______________________________

__________________________________

Page 21: Cust Troubleshoot

21 - EMC Isilon Customer Troubleshooting Guide: OneFS Upgrade Failures

We appreciate your help in improving this document. Submit your feedback at http://bit.ly/isi-docfeedback.

Appendix D: Output of the isi status -q command

Example out put for

isi status -q

Cluster Name: mycluster

Cluster Health: [ ATTN ]

Cluster Storage: HDD SSD

Size: 11G (23G Raw) 0 (0 Raw)

VHS Size: 11G

Used: 573M (5%) 0 (n/a)

Avail: 11G (95%) 0 (n/a)

Health Throughput (bps) HDD Storage SSD Storage

ID |IP Address |DASR | In Out Total| Used / Size |Used / Size

-------------------+-----+-----+-----+-----+-----------------+-----------------

1|192.168.146.128|-A-- | 396K| 828K| 1.2M| 144M/ 2.8G( 5%)| (No SSDs)

2|192.168.146.129|OK | 49K| 3.2M| 3.2M| 145M/ 2.8G( 5%)| (No SSDs)

3|192.168.146.130|OK | 3.5K| 162K| 165K| 142M/ 2.8G( 5%)| (No SSDs)

4|192.168.146.131|OK | 49K| 356K| 405K| 143M/ 2.8G( 5%)| (No SSDs)

-------------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals: | 498K| 4.5M| 5.0M| 573M/ 11G( 5%)| (No SSDs)

Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

You could have arrived here from:

Page 7 - Analysis, continued

Page 11 - Simultaneous upgrade

Page 16 - Post-upgrade checks

Page 17 - Nodes did not all join the cluster

_______________________

_______________________________________________________________________________________

Page 22: Cust Troubleshoot

© 2011 - 2013 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change

without

notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO

REPRESENTATIONS OR

WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND

SPECIFICALLY

DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United State and other

countries.

All other trademarks used herein are the property of their respective owners.