HACMP Basics
HACMP Basics
HistoryIBM's HACMP exists for almost 15 years. It's not actually
an IBM product, they bought it from CLAM, which was later renamed
to Availant and is now called LakeViewTech. Until august 2006, all
development of HACMP was done by CLAM. Nowadays IBM does it's own
development of HACMP in Austin, Poughkeepsie and Bangalore
IBM's high availability solution for AIX, High Availability
Cluster Multi Processing (HACMP), consists of two components:
High Availability: The process of ensuring an application is
available for use through the use of duplicated and/or shared
resources (eliminating Single Points Of Failure SPOF's)
.Cluster Multi-Processing: Multiple applications running on the
same nodes with shared or concurrent access to the data.
A high availability solution based on HACMP provides automated
failure detection, diagnosis, application recovery and node
reintegration. With an appropriate application, HACMP can also
provide concurrent access to the data for parallel processing
applications, thus offering excellent horizontal scalability.
What needs to be protected? Ultimately, the goal of any IT
solution in a critical environment is to provide continuous service
and data protection.
The High Availability is just one building block in achieving
the continuous operation goal. The High Availability is based on
the availability hardware, software (OS and its components),
application and network components.
The main objective of the HACMP is to eliminate Single Points of
Failure (SPOF's)
A fundamental design goal of (successful) cluster design is the
elimination of single points of failure (SPOFs)
Eliminate Single Point of Failure (SPOF) Cluster Eliminated as a
single point of failure
Node Using multiple nodesPower Source Using Multiple circuits or
uninterruptibleNetwork/adapter Using redundant network
adaptersNetwork Using multiple networks to connect nodes.TCP/IP
Subsystem Using non-IP networks to connect adjoining nodes &
clientsDisk adapter Using redundant disk adapter or multiple
adaptersDisk Using multiple disks with mirroring or RAIDApplication
Add node for takeover; configure application monitorAdministrator
Add backup or every very detailed operations guideSite Add
additional site.
Cluster Components
Here are the recommended practices for important cluster
components.
Nodes
HACMP supports clusters of up to 32 nodes, with any combination
of active and standby nodes. While itis possible to have all nodes
in the cluster running applications (a configuration referred to as
"mutualtakeover"), the most reliable and available clusters have at
least one standby node - one node that is normallynot running any
applications, but is available to take them over in the event of a
failure on an active node.
Additionally, it is important to pay attention to environmental
considerations. Nodes should not have acommon power supply - which
may happen if they are placed in a single rack. Similarly, building
a clusterof nodes that are actually logical partitions (LPARs) with
a single footprint is useful as a test cluster, butshould not be
considered for availability of production applications.Nodes should
be chosen that have sufficient I/O slots to install redundant
network and disk adapters.That is, twice as many slots as would be
required for single node operation. This naturally suggests
thatprocessors with small numbers of slots should be avoided. Use
of nodes without redundant adaptersshould not be considered best
practice. Blades are an outstanding example of this. And, just as
every clusterresource should have a backup, the root volume group
in each node should be mirrored, or be on a
RAID device.Nodes should also be chosen so that when the
production applications are run at peak load, there are
stillsufficient CPU cycles and I/O bandwidth to allow HACMP to
operate. The production applicationshould be carefully benchmarked
(preferable) or modeled (if benchmarking is not feasible) and nodes
chosenso that they will not exceed 85% busy, even under the
heaviest expected load.Note that the takeover node should be sized
to accommodate all possible workloads: if there is a singlestandby
backing up multiple primaries, it must be capable of servicing
multiple workloads. On hardwarethat supports dynamic LPAR
operations, HACMP can be configured to allocate processors and
memory toa takeover node before applications are started. However,
these resources must actually be available, oracquirable through
Capacity Upgrade on Demand. The worst case situation e.g., all the
applications ona single node must be understood and planned
for.
Networks
HACMP is a network centric application. HACMP networks not only
provide client access to the applicationsbut are used to detect and
diagnose node, network and adapter failures. To do this, HACMP
usesRSCT which sends heartbeats (UDP packets) over ALL defined
networks. By gathering heartbeat informationon multiple nodes,
HACMP can determine what type of failure has occurred and initiate
the appropriaterecovery action. Being able to distinguish between
certain failures, for example the failure of a networkand the
failure of a node, requires a second network! Although this
additional network can be IPbased it is possible that the entire IP
subsystem could fail within a given node. Therefore, in
additionthere should be at least one, ideally two, non-IP networks.
Failure to implement a non-IP network can potentiallylead to a
Partitioned cluster, sometimes referred to as 'Split Brain'
Syndrome. This situation canoccur if the IP network(s) between
nodes becomes severed or in some cases congested. Since each node
isin fact, still very alive, HACMP would conclude the other nodes
are down and initiate a takeover. Aftertakeover has occurred the
application(s) potentially could be running simultaneously on both
nodes. If theshared disks are also online to both nodes, then the
result could lead to data divergence (massive data corruption).This
is a situation which must be avoided at all costs.
The most convenient way of configuring non-IP networks is to use
Disk Heartbeating as it removes theproblems of distance with rs232
serial networks. Disk heartbeat networks only require a small disk
orLUN. Be careful not to put application data on these disks.
Although, it is possible to do so, you don't wantany conflict with
the disk heartbeat mechanism!
Adapters
As stated above, each network defined to HACMP should have at
least two adapters per node. While it ispossible to build a cluster
with fewer, the reaction to adapter failures is more severe: the
resource groupmust be moved to another node. AIX provides support
for Etherchannel, a facility that can used to aggregateadapters
(increase bandwidth) and provide network resilience. Etherchannel
is particularly useful forfast responses to adapter / switch
failures. This must be set up with some care in an HACMP
cluster.When done properly, this provides the highest level of
availability against adapter failure. Refer to the IBMtechdocs
website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785
for furtherdetails.Many System p TM servers contain built-in
Ethernet adapters. If the nodes are physically close together, itis
possible to use the built-in Ethernet adapters on two nodes and a
"cross-over" Ethernet cable (sometimesreferred to as a "data
transfer" cable) to build an inexpensive Ethernet network between
two nodes forheart beating. Note that this is not a substitute for
a non-IP network.Some adapters provide multiple ports. One port on
such an adapter should not be used to back up anotherport on that
adapter, since the adapter card itself is a common point of
failure. The same thing is trueof the built-in Ethernet adapters in
most System p servers and currently available blades: the ports
have acommon adapter. When the built-in Ethernet adapter can be
used, best practice is to provide an additionaladapter in the node,
with the two backing up each other.Be aware of network detection
settings for the cluster and consider tuning these values. In HACMP
terms,these are referred to as NIM values. There are four settings
per network type which can be used : slow,normal, fast and custom.
With the default setting of normal for a standard Ethernet network,
the networkfailure detection time would be approximately 20
seconds. With todays switched network technology thisis a large
amount of time. By switching to a fast setting the detection time
would be reduced by 50% (10seconds) which in most cases would be
more acceptable. Be careful however, when using custom settings,as
setting these values too low can cause false takeovers to occur.
These settings can be viewed using a varietyof techniques including
: lssrc ls topsvcs command (from a node which is active) or
odmgetHACMPnim |grep p ether and smitty hacmp.
ApplicationsThe most important part of making an application run
well in an HACMP cluster is understanding theapplication's
requirements. This is particularly important when designing the
Resource Group policy behaviorand dependencies. For high
availability to be achieved, the application must have the ability
tostop and start cleanly and not explicitly prompt for interactive
input. Some applications tend to bond to aparticular OS
characteristic such as a uname, serial number or IP address. In
most situations, these problemscan be overcome. The vast majority
of commercial software products which run under AIX are wellsuited
to be clustered with HACMP.
Application Data LocationWhere should application binaries and
configuration data reside? There are many arguments to this
discussion.Generally, keep all the application binaries and data
were possible on the shared disk, as it is easyto forget to update
it on all cluster nodes when it changes. This can prevent the
application from starting orworking correctly, when it is run on a
backup node. However, the correct answer is not fixed. Many
applicationvendors have suggestions on how to set up the
applications in a cluster, but these are recommendations.Just when
it seems to be clear cut as to how to implement an application,
someone thinks of a newset of circumstances. Here are some rules of
thumb:If the application is packaged in LPP format, it is usually
installed on the local file systems in rootvg. Thisbehavior can be
overcome, by bffcreateing the packages to disk and restoring them
with the preview option.This action will show the install paths,
then symbolic links can be created prior to install which pointto
the shared storage area. If the application is to be used on
multiple nodes with different data or configuration,then the
application and configuration data would probably be on local disks
and the data sets onshared disk with application scripts altering
the configuration files during fallover. Also, remember theHACMP
File Collections facility can be used to keep the relevant
configuration files in sync across the cluster.This is particularly
useful for applications which are installed locally.
Start/Stop ScriptsApplication start scripts should not assume
the status of the environment. Intelligent programming
shouldcorrect any irregular conditions that may occur. The cluster
manager spawns theses scripts off in a separatejob in the
background and carries on processing. Some things a start script
should do are: First, check that the application is not currently
running! This is especially crucial for v5.4 users asresource
groups can be placed into an unmanaged state (forced down action,
in previous versions).Using the default startup options, HACMP will
rerun the application start script which may causeproblems if the
application is actually running. A simple and effective solution is
to check the stateof the application on startup. If the application
is found to be running just simply end the start scriptwith exit
0.Verify the environment. Are all the disks, file systems, and IP
labels available?If different commands are to be run on different
nodes, store the executing HOSTNAME to variable.Check the state of
the data. Does it require recovery? Always assume the data is in an
unknown statesince the conditions that occurred to cause the
takeover cannot be assumed.Are there prerequisite services that
must be running? Is it feasible to start all prerequisite
servicesfrom within the start script? Is there an inter-resource
group dependency or resource group sequencingthat can guarantee the
previous resource group has started correctly? HACMP v5.2 and later
hasfacilities to implement checks on resource group dependencies
including collocation rules inHACMP v5.3.Finally, when the
environment looks right, start the application. If the environment
is not correct anderror recovery procedures cannot fix the problem,
ensure there are adequate alerts (email, SMS,SMTP traps etc) sent
out via the network to the appropriate support administrators.Stop
scripts are different from start scripts in that most applications
have a documented start-up routineand not necessarily a stop
routine. The assumption is once the application is started why stop
it? Relyingon a failure of a node to stop an application will be
effective, but to use some of the more advanced featuresof HACMP
the requirement exists to stop an application cleanly. Some of the
issues to avoid are:
Be sure to terminate any child or spawned processes that may be
using the disk resources. Considerimplementing child resource
groups.Verify that the application is stopped to the point that the
file system is free to be unmounted. Thefuser command may be used
to verify that the file system is free.In some cases it may be
necessary to double check that the application vendors stop script
did actuallystop all the processes, and occasionally it may be
necessary to forcibly terminate some processes.Clearly the goal is
to return the machine to the state it was in before the application
start script was run.Failure to exit the stop script with a zero
return code as this will stop cluster processing. * Note: This is
not the case with start scripts!Remember, most vendor stop/starts
scripts are not designed to be cluster proof! A useful tip is to
have stopand start script verbosely output using the same format to
the /tmp/hacmp.out file. This can be achievedby including the
following line in the header of the script: set -x &&
PS4="${0##*/}"'[$LINENO]
Hacmp can be configured in 3 ways.
1. Rotating2. Cascading3. Mutual Failover
The cascading and rotating resource groups are the classic,
pre-HA 5.1 types. The new custom type of resource group has been
introduced in HA 5.1 onwards.
Cascading resource group:Upon node failure, a cascading resource
group falls over to the available node with the next priority in
the node priority list.Upon node reintegration into the cluster, a
cascading resource group falls back to its home node by
default.
Cascading without fallback Thisoption, this means whenever a
primary node fails, the package will failover to the next available
node in the list and when the primary node comes online then the
package will not fallback automatically. We need to move package to
its home node at a convenient time.
Rotating resource group:This is almost similar to Cascading
without fallback, whenever package failover to the standby nodes it
will never fallback to the primary node automatically, we need to
move it manually at our convenience.
Mutual takeover:Mutual takeover option, which means both the
nodes in this type are active-active mode. Whenever fail over
happens the package on the failed node will move to the other
active node and will run with already existing package. Once the
failed node comes online we can move the package manually to that
node.
Useful HACMP commands
clstat - show cluster state and substate; needs clinfo. cldump -
SNMP-based tool to show cluster state cldisp - similar to cldump,
perl script to show cluster state. cltopinfo - list the local view
of the cluster topology. clshowsrv -a - list the local view of the
cluster subsystems. clfindres (-s) - locate the resource groups and
display status. clRGinfo -v - locate the resource groups and
display status. clcycle - rotate some of the log files. cl_ping - a
cluster ping program with more arguments. clrsh - cluster rsh
program that take cluster node names as argument. clgetactivenodes
- which nodes are active? get_local_nodename - what is the name of
the local node? clconfig - check the HACMP ODM. clRGmove -
online/offline or move resource groups. cldare - sync/fix the
cluster. cllsgrp - list the resource groups. clsnapshotinfo -
create a large snapshot of the hacmp configuration. cllscf - list
the network configuration of an hacmp cluster. clshowres - show the
resource group configuration. cllsif - show network interface
information. cllsres - show short resource group information. lssrc
-ls clstrmgrES - list the cluster manager state. lssrc -ls topsvcs
- show heartbeat information. cllsnode - list a node centric
overview of the hacmp configuration.
HACMP log files
/usr/sbin/cluster/etc/rhosts --- to accept incoming
communication from clcomdES (cluster communucation enahanced
security)/usr/es/sbin/cluster/etc/rhosts
Note: If there is an unresolvable label in the
/usr/es/sbin/cluster/etc/rhosts file, then all clcomdES connections
from remote nodes will be denied.
cluster manager clstrmgrEScluster lock Daemon (clockdES)cluster
multi peer extension communication daemon (clsmuxpdES)
The clcomdES is used for cluster configuration operations such
as cluster synchronisationcluster management (C-SPoC) * Dynamic
re-configuration DARE configuration. (DARE ) operation.
For clcomdES there should be atleast 20 MB free space in /var
file system./var/hacmp/clcomd/clcomd.log --it requires 2
MB/var/hacmp/clcomd/clcomdiag.log --it requires 18MBAdditional 1 MB
required for /var/hacmp/odmcache directory
clverfify.log also present in /var directory
/var/hacmp/clverify/current//* contains log for mcurrent execution
of clverify/var/hacmp/clverify/pass//* contains logs from the last
passed verification/var/hacmp/clverify/pass.prev//* contains log
from the second last passed verification
Steps 1 to 17 to configure HACMP
Steps to configure HACMP:
1. Install the nodes, make sure the redundancy is maintained for
power supplies, n/w andfiber n/ws. Then Install AIX on the nodes.2.
Install all the HACMP filesets except HAview and HATivoli.Install
all the RSCT filesets from the AIX base CD.Make sure that the AIX,
HACMP patches and server code are at the latest level
(ideallyrecommended).4. Check for fileset bos.clvm to be present on
both the nodes. This is required to make theVGs enhanced concurrent
capable.5. V.IMP: Reboot both the nodes after installing the HACMP
filesets.6. Configure shared storage on both the nodes. Also in
case of a disk heartbeat, assign a1GB shared storage LUN on both
nodes.7. Create the required VGs only on the first node. The VGs
can be either normal VGs orEnhanced concurrent VGs. Assign
particular major number to each VGs while creatingthe VGs. Record
the major no. information.To check the Majar no. use the command:ls
lrt /dev grep Mount automatically at system restart should be set
to NO.8. Varyon the VGs that was just created.9. V.IMP: Create log
LV on each VG first before creating any new LV. Give a uniquename
to logLV.Destroy the content of logLV by: logform
/dev/loglvnameRepeat this step for all VGs that were created.10.
Create all the necessary LVs on each VG.11. Create all the
necessary file systems on each LV created..you can create mount
ptsas per the requirement of the customer,Mount automatically at
system restart should be set to NO.12. umount all the filesystems
and varyoff all the VGs.
13. chvg an All VGs will be set to do not mount automatically
at((---System restart.14. Go to node 2 and run cfgmgr v to import
the shared volumes.15. Import all the VGs on node 2use smitty
importvg import with the same major number as assigned on
node((-----16. Run chvg an for all VGs on node 2.17. V.IMP:
Identify the boot1, boot2, service ip and persistent ip for both
the nodesand make the entry in the /etc/hosts.
Step 18 to configure HACMP
18. Define cluster name.
Step 19 Define Cluster Nodes
19. Define the cluster nodes. #smitty hacmp -> Extended
Configuration -> Extended topology configuration -> Configure
an HACMP node - > Add a node to an HACMP cluster Define both the
nodes on after the other.
Step20 Discover HACMP config for Network settings
22. Discover HACMP config: This will import for both nodes all
the node info, boot ips,service ips from the /etc/hostssmitty hacmp
-> Extended configurations -> Discover hacmp related
information
Step21 Adding Communication interface
Add HACMP communication interfaces. (Ether interfaces.)smitty
hacmp -> Extended Configuration -> Extended Topology
Configuration ->Configure HACMP networks -> Add a network to
the HACMP cluster.Select ether and Press enter.Then select diskhb
and Press enter. Diskhb is your non-tcpip heartbeat.
step 22 Adding device for Disk Heart Beat
Include the interfaces/devices in the ether n/w and diskhb
already defined.smitty hacmp -> Extended Configuration ->
Extended Topology Configuration ->Configure HACMP communication
interfaces/devices -> Add communicationInterfaces/devices.
step 24 Adding persistent IP
Add the persistent IPs:
smitty hacmp -> Extended Configuration -> Extended
Topology Configuration ->Configure HACMP persistent nodes IP
label/Addresses
step 25 Adding Persistent IP labels
Add a persistent ip label for both nodes.
step 26 Defining IP labels
Define the service IP labels for both nodes.smitty hacmp ->
Extended Configuration -> Extended Resource Configuration
->HACMP extended resource configuration -> Configure HACMP
service IP label
step 27 Adding Resource Group
Add Resource Groups:smitty hacmp -> Extended Configuration
-> Extended Resource Configuration ->HACMP extended resource
group configuration
Continue similarly for all the resource groups.The node selected
first while defining the resource group will be the primary owner
ofthat resource group. The node after that is secondary node.Make
sure you set primary node correctly for each resource group. Also
set the failover/fallback policies as per the requirement of the
setup
Step 32 & 33 Check for cluster Stabilize & VG varied
on
Wait for the cluster to stabilize. You can check when the
cluster is up by followingcommandsa. netstat ib. ifconfig a :
look-out for service ip. It will show on each node if the cluster
is up.
Check whether the VGs under clusters RGs are varied-ON and the
filesystems in theVGs are mounted after the cluster start.
Here test1vg and test2vg are VGs which are varied-ON when the
cluster is started andFilesystems /test2 and /test3 are mounted
when the cluster starts./test2 and /test3 are in test2vg which is
part of the RG which is owned by this node.32. Perform all the
tests such as resource take-over, node failure, n/w failure and
verifythe cluster before releasing the system to the customer.
Posted by Santosh Gupta at 8:18 AM 0 comments
Links to this post step 30 & 31 Synchronize & start
Cluster
Synchronize the cluster:This will sync the info from one node to
second node.Smitty cl_sync
Thats it. Now you are ready to start the cluster.Smitty
clstart
You can start the cluster together on both nodes or start
individually on each node.
You can start the cluster together on both nodes or start
individually on each node.
step 29 Adding IP label & RG owned by Node
Add the service IP label for the owner node and also the VGs
owned by the owner nodeOf this resource group.
Continue similarly for all the resource groups.
Posted by Santosh Gupta at 8:12 AM 0 comments
Links to this post step 28 Setting attributes of Resource
group
Set attributes of the resource groups already defined:Here you
have to actually assign the resources to the resource groups.smitty
hacmp -> Extended Configuration -> Extended Resource
Configuration ->HACMP extended resource group configuration
Posted by Santosh Gupta at 8:11 AM 0 comments
Links to this post step 27 Adding Resource Group
Add Resource Groups:smitty hacmp -> Extended Configuration
-> Extended Resource Configuration ->HACMP extended resource
group configuration
Continue similarly for all the resource groups.The node selected
first while defining the resource group will be the primary owner
ofthat resource group. The node after that is secondary node.Make
sure you set primary node correctly for each resource group. Also
set the failover/fallback policies as per the requirement of the
setup
Posted by Santosh Gupta at 8:06 AM 0 comments
Links to this post step 26 Defining IP labels
Define the service IP labels for both nodes.smitty hacmp ->
Extended Configuration -> Extended Resource Configuration
->HACMP extended resource configuration -> Configure HACMP
service IP label
Posted by Santosh Gupta at 8:05 AM 0 comments
Links to this post step 25 Adding Persistent IP labels
Add a persistent ip label for both nodes.
Posted by Santosh Gupta at 8:01 AM 0 comments
Links to this post step 24 Adding persistent IP
Add the persistent IPs:
smitty hacmp -> Extended Configuration -> Extended
Topology Configuration ->Configure HACMP persistent nodes IP
label/Addresses
Posted by Santosh Gupta at 7:57 AM 0 comments
Links to this post step23 Adding boot IP & Disk heart beat
information
Include all the four boot ips (2 for each nodes) in this ether
interface already defined.Then include the disk for heartbeat on
both the nodes in the diskhb already defined
Posted by Santosh Gupta at 7:51 AM 0 comments
Links to this post step 22 Adding device for Disk Heart Beat
Include the interfaces/devices in the ether n/w and diskhb
already defined.smitty hacmp -> Extended Configuration ->
Extended Topology Configuration ->Configure HACMP communication
interfaces/devices -> Add communicationInterfaces/devices.
Posted by Santosh Gupta at 7:47 AM 0 comments
Links to this post Step21 Adding Communication interface
Add HACMP communication interfaces. (Ether interfaces.)smitty
hacmp -> Extended Configuration -> Extended Topology
Configuration ->Configure HACMP networks -> Add a network to
the HACMP cluster.Select ether and Press enter.Then select diskhb
and Press enter. Diskhb is your non-tcpip heartbeat.
Posted by Santosh Gupta at 7:45 AM 0 comments
Links to this post Step20 Discover HACMP config for Network
settings
22. Discover HACMP config: This will import for both nodes all
the node info, boot ips,service ips from the /etc/hostssmitty hacmp
-> Extended configurations -> Discover hacmp related
information
Posted by Santosh Gupta at 7:41 AM 0 comments
Links to this post Step 19 Define Cluster Nodes
19. Define the cluster nodes. #smitty hacmp -> Extended
Configuration -> Extended topology configuration -> Configure
an HACMP node - > Add a node to an HACMP cluster Define both the
nodes on after the other.
Posted by Santosh Gupta at 7:39 AM 0 comments
Links to this post Thursday, February 28, 2008
Step 18 to configure HACMP
18. Define cluster name.
Steps 1 to 17 to configure HACMP
Steps to configure HACMP:
1. Install the nodes, make sure the redundancy is maintained for
power supplies, n/w andfiber n/ws. Then Install AIX on the nodes.2.
Install all the HACMP filesets except HAview and HATivoli.Install
all the RSCT filesets from the AIX base CD.Make sure that the AIX,
HACMP patches and server code are at the latest level
(ideallyrecommended).4. Check for fileset bos.clvm to be present on
both the nodes. This is required to make theVGs enhanced concurrent
capable.5. V.IMP: Reboot both the nodes after installing the HACMP
filesets.6. Configure shared storage on both the nodes. Also in
case of a disk heartbeat, assign a1GB shared storage LUN on both
nodes.7. Create the required VGs only on the first node. The VGs
can be either normal VGs orEnhanced concurrent VGs. Assign
particular major number to each VGs while creatingthe VGs. Record
the major no. information.To check the Majar no. use the command:ls
lrt /dev grep Mount automatically at system restart should be set
to NO.8. Varyon the VGs that was just created.9. V.IMP: Create log
LV on each VG first before creating any new LV. Give a uniquename
to logLV.Destroy the content of logLV by: logform
/dev/loglvnameRepeat this step for all VGs that were created.10.
Create all the necessary LVs on each VG.11. Create all the
necessary file systems on each LV created..you can create mount
ptsas per the requirement of the customer,Mount automatically at
system restart should be set to NO.12. umount all the filesystems
and varyoff all the VGs.
13. chvg an All VGs will be set to do not mount automatically
at((---System restart.14. Go to node 2 and run cfgmgr v to import
the shared volumes.15. Import all the VGs on node 2use smitty
importvg import with the same major number as assigned on
node((-----16. Run chvg an for all VGs on node 2.17. V.IMP:
Identify the boot1, boot2, service ip and persistent ip for both
the nodesand make the entry in the /etc/hosts.
HACMP v5.x Disk Heartbeat device configuration
Creating a Disk Heartbeat device in HACMP v5.x
Introduction This document is intended to supplement existing
documentation on how to configure, test, and monitor a disk
heartbeat device and network in HACMP/ES V 5.x. This feature is new
in V5.1, and it provides another alternative for non-ip based
heartbeats. The intent of this document is to provide step-by-step
directions as they are currently sketchy in the HACMP v5.1 pubs.
This will hopefully clarify several misconceptions that have been
brought to my attention. This example consists of a two-node
cluster (nodes GT40 & SL55) with shared ESS vpath devices. If
more than two nodes exist in your cluster, you will need N number
or non-ip heartbeat networks. Where N represents the number of
nodes in the cluster. (i.e. three node cluster requires 3 non-ip
heartbeat networks). This creates a heartbeat ring. Its worth
noting that one should not confuse concurrent volume groups with
concurrent resource groups. And note, there is a difference between
concurrent volume groups and enhanced concurrent volume groups. A
concurrent resource group is one which may be active on more than
one node at a type. A concurrent volume group also shares the
characteristic that it may be active on more than one node at a
time. This is also true for an enhanced concurrent VG; however, in
a non-concurrent resource group, the enhanced concurrent VG, while
it may be active and not have a SCSI reserve residing on the disk,
its data is only normally accessed by one system at a time.
Pre-Reqs
In this document, it is assumed that the shared storage devices
are already made available and configured to AIX, and that the
proper levels of RSCT and HACMP are already installed. Since
utilizing enhanced-concurrent volume groups, it is also necessary
to make sure that bos.clvm.enh is installed. This is not normally
installed as part of a HACMP installation via the installp
command.Disk Heartbeat Details This provides the ability to use
existing shared disks, regardless of disk type, to provide a serial
network like heartbeat path. A benefit of this is that one need not
dedicate the integrated serial ports for HACMP heartbeats (if
supported on the subject systems) or purchase an 8-port
asynchronous adapter.This feature utilizes a special area on the
disk previously reserved for Concurrent Capable volume groups
(traditionally only for SSA disks). Since AIX 5.2 dropped support
for the SSA concurrent volume groups, this fit makes it available
for use. This also means that the disk chosen for serial heartbeat
can be part of a data volume group. (Note Performance Concerns
below)
The disk heart beating code went into the 2.2.1.30 version of
RSCT. Some recommended APARs bring that to 2.2.1.31. If you've got
that level installed, and HACMP 5.1, you can use disk heart
beating. The relevant file to look for is
/usr/sbin/rsct/bin/hats_diskhb_nim. Though it is supported mainly
through RSCT, we recommend AIX 5.2 when utilizing disk
heartbeat.
To use disk heartbeats, no node can issue a SCSI reserve for the
disk. This is because both nodes using it for heart beating must be
able to read and write to that disk. It is sufficient that the disk
be in an enhanced concurrent volume group to meet this requirement.
(It should also be possible to use a disk that is in no volume
group for disk heart beating. RSCT certainly won't care; but HACMP
SMIT panels may not be particularly helpful in setting this
up.)
Now, in HACMP 5.1 with AIX 5.1, enhanced concurrent mode volume
groups can be used only in concurrent (or "online on all available
nodes") resource groups. This means that disk heart beating is
useful only to people running concurrent configurations, or who can
allocate such a volume group/disk (which is certainly possible,
though perhaps an expensive approach). In other words, at HACMP 5.1
and AIX 5.1, typical HACMP clusters (with a server and idle
standby) will require an additional concurrent resource group with
a disk in an enhanced concurrent VG dedicated for heartbeat use. At
AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG
that resides in a non-concurrent resource group. At AIX 5.2, one
may also use the fast disk takeover feature in non-concurrent
resource groups with enhanced concurrent volume groups. With HACMP
5.1 and AIX 5.2, enhanced concurrent mode volume groups can be used
in serial access configurations for fast disk takeover, along with
disk heart beating. (AIX 5.2 requires RSCT 2.3.1.0 or later) That
is, the facility becomes usable to the average customer, without
committment of additional resource, since disk heart beating can
occur on a volume group used for ordinary filesystem and logical
volume activity.
Performance Concerns with Disk Heart Beating
Most modern disks take somewhere around 15 milliseconds to
service an IO request, which means that they can't do much more
than 60 seeks per second. The sectors used for disk heart beating
are part of the VGDA, which is at the outer edge of the disk, and
may not be near the application data. This means that every time a
disk heart beat is done, a seek will have to be done. Disk heart
beating will typically (with the default parameters) require four
(4) seeks per second. That is each of two nodes will write to the
disk and read from the disk once/second, for a total of 4 IOPS. So,
if possible, a disk should be selected as a heart beat path that
does not normally do more than about 50 seeks per second. The
filemon tool can be used to monitor the seek activity on a
disk.
In cases where a disk must be used for heart beating that
already has a high seek rate, it may be necessary to change the
heart beat timing parameters to prevent long write delays from
being seen as a failure.
The above cautions as stated apply to JBOD configurations, and
should be modified based on the technology of the disk subsystem:
If the disk used for heart beating is in a controller that provides
large amounts of cache - such as the ESS - the number of seeks per
second can be much larger If the disk used for heart beating is
part of a RAID set without a caching front end controller, the disk
may be able to support fewer seeks, due to the extra activity
required by RAID operations Pros & Cons of using Disk Heart
BeatingPros:1. No additional hardware needed.2. Easier to span
greater distances.3. No loss in usable storage space and can use
existing data volume groups.4. Uses enhanced concurrent vgs which
also allows for fast-disk takeover.
Cons:1. Must be aware of the devices diskhb uses and administer
devices properly*2. Lose the forced down option of stopping cluster
services because of enhanced concurrent vg usage.
*I have had a customer delete all their disk definitions and run
cfgmgr again to clean up number holes in their device definition
list. When they did, obviously , the device names did not come back
in the same order as they were before. So the diskhb device
assigned to HACMP, was no longer valid as a different device was
configured using the old device name and it was not part of an
enhanced concurrent vg. Hence diskhb no longer worked, and since
the customer did not monitor their cluster either, they were
unaware that the diskhb no longer worked.
Configuring Disk Heartbeat
As mentioned previously, disk heartbeat utilizes
enhanced-concurrent volume groups. If starting with a new
configuration of disks, you will want to create enhanced-concurrent
volume groups, either manually, or by utilizing C-SPOC. My example
shows using C-SPOC which is the best practice to use here.
If you plan to use an existing volume group for disk heartbeats
that is not enhanced concurrent, then you will have to convert them
to such using the chvg command. We recommend that the VG be active
on only one node, and that the application not be running when
making this change run chvg C vgname to change the VG to enhanced
concurrent mode. Vary it off, then run importvg L vgname on the
other node to make it aware that the vg is now enhanced concurrent
capable. If using this method, you can skip to the Creating Disk
Heartbeat Devices and Network section of this document.
Disk and VG Preparation
To be able to use C-SPOC successfully, it is required that some
basic IP based topology already exists, and that the storage
devices have their PVIDs in both systems ODMs. This can be verified
by running lspv on each system. If a PVID does not exist on each
system, it is necessary to run chdev -l -a pv=yes on each system.
This will allow C-SPOC to match up the device(s) as known shared
storage devices. In this example, vpath0 on GT40 is the same
virtual disk as vpath3 on SL55. Use C-SPOC to create an Enhanced
Concurrent volume group. In the following example, since vpath
devices are being used, the following smit screen paths were
used.Go to HACMP Concurrent Logical Volume(smitty cl_admin Create a
Concurrent Volume Group with Data( Concurrent Volume
Groups(Management Path Devices and press Enter
Choose the appropriate nodes, and then choose the appropriate
shared storage devices based on pvids (vpath0 and vpath3 in this
example). Choose a name for the VG , desired PP size, make sure
that Enhanced Concurrent Mode is set to true and press Enter.
(enhconcvg in this example). This will create the shared
enhanced-concurrent vg needed for our disk heartbeat. .
Its a good idea to verify via lspv once this has completed to
make sure the device and vg is show appropriately as follows:
GT40#/ lspvvpath0 000a7f5af78e0cf4 enhconcvg
SL55#/lspvvpath3 000a7f5af78e0cf4 enhconcvg
Creating Disk Heartbeat Devices and Network
There are two different ways to do this. Since we have already
created the enhanced concurrent vg, we can use the discovery method
(1) and let HA find it for us. Or we can do this manually via the
Pre-defined devices method (2). Following is an example of
each.
1) Creating via Discover Method: (See Note)Extended(Enter smitty
hacmp Press(Discover HACMP-related Information from Configured
Nodes(Configuration Enter
This will run automatically and create a clip_config file that
contains the information it has discovered. Once completed, go back
to the Extended Configuration menu and chose:
Extended Topology Add Communication(Configure HACMP
Communication Interfaces/Devices(Configuration Add Discovered
Communication Interface and(Interfaces/Devices Choose appropriate
devices (ex. vpath0 and(Communication Devices (Devices vpath3)
Select Point-to-Point Pair of Discovered Communication Devices
to AddMove cursor to desired item and press F7. Use arrow keys to
scroll.ONE OR MORE items can be selected.Press Enter AFTER making
all selections.# Node Device Device Path Pvid> nodeGT40 vpath0
/dev/vpath0 000a7f5af78> nodeSL55 vpath3 /dev/vpath3
000a7f5af78
Note: Base HA 5.1 appears to have a problem when using the
Discovered Devices this method. If you get this error: "ERROR:
Invalid node name 000a7f5af78e0cf4".Then you will need apar
IY51594. Otherwise you will have to create via the Pre-Defined
Devices method. Once corrected, this section will be completed2)
Creating via Pre-Defined Devices Method
When using this method, it is necessary to create a diskhb
network first, then assign the disk-node pair devices to the
network. Create the diskhb network as follows:
Extended Topology( Extended Configuration (smitty hacmp (Add a
Network to the HACMP cluster (Configure HACMP Networks
(Configuration Enter desired network name (ex.
disknet1)--press(choose diskhb Enter
Extended Topology( Extended Configuration (smitty hacmp Add(
Configure HACMP Communication Interfaces/Devices (Configuration Add
Pre-Defined Communication Interfaces and(Communication
Interfaces/Devices (Devices ( Choose your diskhb Network Name
(Communication Devices
Add a Communication Device
Type or select values in entry fields. Press Enter AFTER making
all desired changes.
[Entry Fields] * Device Name [GT40_hboverdisk] * Network Type
diskhb * Network Name disknet1 * Device Path [/dev/vpath0] * Node
Name [GT40]
For Device Name, that is a unique name you can chose. It will
show up in your topology under this name, much like serial
heartbeat and ttys have in the past.
For the Device Path, you want to put in /dev/. Then choose the
corresponding node for this device and device name (ex. GT40). Then
press Enter.
You will repeat this process for the other node (ex. SL55) and
the other device (vpath3). This will complete both devices for the
diskhb network.
Testing Disk Heartbeat Connectivity
Once the device and network definitions have been created, it is
a good idea to test it and make sure communications is working
properly. If the volume group is varied on in normal mode on one of
the nodes, the test will probably not work.
/usr/sbin/rsct/bin/dhb_read is used to test the validity of a
diskhb connection. The usage of dhb_read is as follows:
dhb_read -p devicename //dump diskhb sector contentsdhb_read -p
devicename -r //receive data over diskhb networkdhb_read -p
devicename -t //transmit data over diskhb network
To test that disknet1, in the example configuration, can
communicate from nodeB(ex. SL55) to nodeA (ex. GT40), you would run
the following commands:
On nodeA, enter:
dhb_read -p rvpath0 -r
On nodeB, enter:
dhb_read -p rvpath3 -t
Note: That the device name is raw device as designated with the
r proceeding the device name.
If the link from nodeB to nodeA is operational, both nodes will
display:
Link operating normally.
You can run this again and swap which node transmits and which
one receives. To make the network active, it is necessary to sync
up the cluster. Since the volume group has not been added to the
resource group, we will sync up once instead of twice.
Add Shared Disk as a Shared Resource
In most cases you would have your diskhb device on a shared data
vg. It is necessary to add that vg into your resource group and
synchronize the cluster.
Extended( Extended Configuration(smitty hacmp Resource
Configuration > Change/Show(Extended Resource Group
Configuration and press Enter.(Resources and Attributes for a
Resource Group
Choose the appropriate resource group, enter the new vg
(enhconcvg) into the volume group list and press Enter.
Return to the top of the Extended Configuration menu and
synchronize the cluster.
Monitor Disk Heartbeat
Once the cluster is up and running, you can monitor the activity
of the disk (actually all) heartbeats via lssrc -ls topsvcs. An
example of the output follows:
Subsystem Group PID Statustopsvcs topsvcs 32108 active
Network Name Indx Defd Mbrs St Adapter ID Group IDdisknet1 [ 3]
2 2 S 255.255.10.0 255.255.10.1disknet1 [ 3] rvpath3 0x86cd1b02
0x86cd1b4fHB Interval = 2 secs. Sensitivity = 4 missed beatsMissed
HBs: Total: 0 Current group: 0Packets sent : 229 ICMP 0 Errors: 0
No mbuf: 0Packets received: 217 ICMP 0 Dropped: 0NIM's PID:
28724
Be aware that there is a grace period for heartbeats to start
processing. This is normally around 60 seconds. So if you run this
command quickly after starting the cluster, you may not see
anything at all until heartbeat processing is started after the
grace period time has elapsed.
HACMP failover scenario
HA failover scenarios
1. Graceful For graceful failover, you can run smitty clstop
then select graceful option. This will not change anything except
stopping the cluster on that node.Note: If you stop the cluster,
check the status using lssrc g cluster, sometimes clstrmgrES daemon
will take long time to stop, DO NOT KILL THIS DAEMON.It will stop
automatically after a while.You can do this on both the nodes
2. TakeoverFor takeover, run smitty clstop with takeover option,
this will stop the cluster on that node and the standby node will
take over the pakageYou can do this on both the nodes
3. Soft Pakckage Failover Run smitty
cm_hacmp_resource_group_and_application_management_menu
>>>Move a Resource Group to Another Node
>>>>select the package name and node name
>>>enterThis will move the package from that node to the
node that you have selected in the above menu. This method will
give lot of troubles in HA 4.5 whereas it runs good on HA 5.2
unless we have any apps startup issues.You can do this on both the
nodes 4. Failover Network Adapter(s):For this type of testing , run
ifconfig enx down , then package IP will failover to primary
adapter. You can not even see any outage or anything.
We can manually (ifconfig enx up) bring it back to original
adapter , but better to reboot the server to bring the package back
to the original node
5. Hardware Failure (crash):This is a standard type of testing;
run the command reboot q then the node will godown without stopping
any apps and come up immediately. The package will failover to the
standby node with in 2 min os downtime (Even tough HA failover is
fast, some apps will take long time to start
Posted by Santosh Gupta at 8:31 AM 0 comments
Links to this post Friday, February 15, 2008
Specifying the default gateway on a specific interface in
HACMP
Specifying the default gateway on a specific interface
When you're using HACMP, you usually have multiple network
adapters installed and thus multiple network interface to handle
with. If AIX configured the default gateway on a wrong interface
(like on your management interface instead of the boot interface),
you might want to change this, so network traffic isn't sent over
the management interface. Here's how you can do this:
First, stop HACMP or do a take-over of the resource groups to
another node; this will avoid any problems with applications when
you start fiddling with the network configuration.
Then open up a virtual terminal window to the host on your HMC.
Otherwise you would loose the connection, as soon as you drop the
current default gateway.
Now you need to determine where your current default gateway is
configured. You can do this by typing: lsattr -El inet0 and netstat
-nr. The lsattr command will show you the current default gateway
route and the netstat command will show you the interface it is
configured on. You can also check the ODM: odmget
-q"attribute=route" CuAt.
Now, delete the default gateway like this:lsattr -El inet0 | awk
'$2 ~ /hopcount/ { print $2 }' | read GWchdev -l inet0 -a
delroute=${GW}
If you would now use the route command to specifiy the default
gateway on a specific interface, like this:route add 0 [ip address
of default gateway: xxx.xxx.xxx.254] -if enXYou will have a working
entry for the default gateway. But... the route command does not
change anything in the ODM. As soon as your system reboots; the
default gateway is gone again. Not a good idea.
A better solution is to use the chdev command:chdev -l inet0 -a
addroute=net,-hopcount,0,,0,[ip address of default gateway]This
will set the default gateway to the first interface available.
To specify the interface use:chdev -l inet0 -a
addroute=net,-hopcount,0,if,enX,,0,[ip address of default
gateway]Substitute the correct interface for enX in the command
above.
If you previously used the route add command, and after that you
use chdev to enter the default gateway, then this will fail. You
have to delete it first by using route delete 0, and then give the
chdev command.
Afterwards, check with lsattr -El inet0 and odmget
-q"attribute=route" CuAt if the new default gateway is properly
configured. And ofcourse, try to ping the IP address of the default
gateway and some outside address. Now reboot your system and check
if the default gateway remains configured on the correct interface.
And startup HACMP again!
HACMP topology & usefull commands
Hacmp can be configured in 3 ways.
1. Rotating2. Cascading3. Mutual Failover
The cascading and rotating resource groups are the classic,
pre-HA 5.1 types. The new custom type of resource group has been
introduced in HA 5.1 onwards.
Cascading resource group:Upon node failure, a cascading resource
group falls over to the available node with the next priority in
the node priority list.Upon node reintegration into the cluster, a
cascading resource group falls back to its home node by
default.
Cascading without fallback Thisoption, this means whenever a
primary node fails, the package will failover to the next available
node in the list and when the primary node comes online then the
package will not fallback automatically. We need to move package to
its home node at a convenient time.
Rotating resource group:This is almost similar to Cascading
without fallback, whenever package failover to the standby nodes it
will never fallback to the primary node automatically, we need to
move it manually at our convenience.
Mutual takeover:Mutual takeover option, which means both the
nodes in this type are active-active mode. Whenever fail over
happens the package on the failed node will move to the other
active node and will run with already existing package. Once the
failed node comes online we can move the package manually to that
node.
Useful HACMP commands
clstat - show cluster state and substate; needs clinfo. cldump -
SNMP-based tool to show cluster state cldisp - similar to cldump,
perl script to show cluster state. cltopinfo - list the local view
of the cluster topology. clshowsrv -a - list the local view of the
cluster subsystems. clfindres (-s) - locate the resource groups and
display status. clRGinfo -v - locate the resource groups and
display status. clcycle - rotate some of the log files. cl_ping - a
cluster ping program with more arguments. clrsh - cluster rsh
program that take cluster node names as argument. clgetactivenodes
- which nodes are active? get_local_nodename - what is the name of
the local node? clconfig - check the HACMP ODM. clRGmove -
online/offline or move resource groups. cldare - sync/fix the
cluster. cllsgrp - list the resource groups. clsnapshotinfo -
create a large snapshot of the hacmp configuration. cllscf - list
the network configuration of an hacmp cluster. clshowres - show the
resource group configuration. cllsif - show network interface
information. cllsres - show short resource group information. lssrc
-ls clstrmgrES - list the cluster manager state. lssrc -ls topsvcs
- show heartbeat information. cllsnode - list a node centric
overview of the hacmp configuration.
HACMP Basics
HACMP Basics
HistoryIBM's HACMP exists for almost 15 years. It's not actually
an IBM product, they bought it from CLAM, which was later renamed
to Availant and is now called LakeViewTech. Until august 2006, all
development of HACMP was done by CLAM. Nowadays IBM does it's own
development of HACMP in Austin, Poughkeepsie and Bangalore
IBM's high availability solution for AIX, High Availability
Cluster Multi Processing (HACMP), consists of two components:
High Availability: The process of ensuring an application is
available for use through the use of duplicated and/or shared
resources (eliminating Single Points Of Failure SPOF's)
.Cluster Multi-Processing: Multiple applications running on the
same nodes with shared or concurrent access to the data.
A high availability solution based on HACMP provides automated
failure detection, diagnosis, application recovery and node
reintegration. With an appropriate application, HACMP can also
provide concurrent access to the data for parallel processing
applications, thus offering excellent horizontal scalability.
What needs to be protected? Ultimately, the goal of any IT
solution in a critical environment is to provide continuous service
and data protection.
The High Availability is just one building block in achieving
the continuous operation goal. The High Availability is based on
the availability hardware, software (OS and its components),
application and network components.
The main objective of the HACMP is to eliminate Single Points of
Failure (SPOF's)
A fundamental design goal of (successful) cluster design is the
elimination of single points of failure (SPOFs)
Eliminate Single Point of Failure (SPOF) Cluster Eliminated as a
single point of failure
Node Using multiple nodesPower Source Using Multiple circuits or
uninterruptibleNetwork/adapter Using redundant network
adaptersNetwork Using multiple networks to connect nodes.TCP/IP
Subsystem Using non-IP networks to connect adjoining nodes &
clientsDisk adapter Using redundant disk adapter or multiple
adaptersDisk Using multiple disks with mirroring or RAIDApplication
Add node for takeover; configure application monitorAdministrator
Add backup or every very detailed operations guideSite Add
additional site.
Cluster Components
Here are the recommended practices for important cluster
components.
Nodes
HACMP supports clusters of up to 32 nodes, with any combination
of active and standby nodes. While itis possible to have all nodes
in the cluster running applications (a configuration referred to as
"mutualtakeover"), the most reliable and available clusters have at
least one standby node - one node that is normallynot running any
applications, but is available to take them over in the event of a
failure on an active node.
Additionally, it is important to pay attention to environmental
considerations. Nodes should not have acommon power supply - which
may happen if they are placed in a single rack. Similarly, building
a clusterof nodes that are actually logical partitions (LPARs) with
a single footprint is useful as a test cluster, butshould not be
considered for availability of production applications.Nodes should
be chosen that have sufficient I/O slots to install redundant
network and disk adapters.That is, twice as many slots as would be
required for single node operation. This naturally suggests
thatprocessors with small numbers of slots should be avoided. Use
of nodes without redundant adaptersshould not be considered best
practice. Blades are an outstanding example of this. And, just as
every clusterresource should have a backup, the root volume group
in each node should be mirrored, or be on a
RAID device.Nodes should also be chosen so that when the
production applications are run at peak load, there are
stillsufficient CPU cycles and I/O bandwidth to allow HACMP to
operate. The production applicationshould be carefully benchmarked
(preferable) or modeled (if benchmarking is not feasible) and nodes
chosenso that they will not exceed 85% busy, even under the
heaviest expected load.Note that the takeover node should be sized
to accommodate all possible workloads: if there is a singlestandby
backing up multiple primaries, it must be capable of servicing
multiple workloads. On hardwarethat supports dynamic LPAR
operations, HACMP can be configured to allocate processors and
memory toa takeover node before applications are started. However,
these resources must actually be available, oracquirable through
Capacity Upgrade on Demand. The worst case situation e.g., all the
applications ona single node must be understood and planned
for.
Networks
HACMP is a network centric application. HACMP networks not only
provide client access to the applicationsbut are used to detect and
diagnose node, network and adapter failures. To do this, HACMP
usesRSCT which sends heartbeats (UDP packets) over ALL defined
networks. By gathering heartbeat informationon multiple nodes,
HACMP can determine what type of failure has occurred and initiate
the appropriaterecovery action. Being able to distinguish between
certain failures, for example the failure of a networkand the
failure of a node, requires a second network! Although this
additional network can be IPbased it is possible that the entire IP
subsystem could fail within a given node. Therefore, in
additionthere should be at least one, ideally two, non-IP networks.
Failure to implement a non-IP network can potentiallylead to a
Partitioned cluster, sometimes referred to as 'Split Brain'
Syndrome. This situation canoccur if the IP network(s) between
nodes becomes severed or in some cases congested. Since each node
isin fact, still very alive, HACMP would conclude the other nodes
are down and initiate a takeover. Aftertakeover has occurred the
application(s) potentially could be running simultaneously on both
nodes. If theshared disks are also online to both nodes, then the
result could lead to data divergence (massive data corruption).This
is a situation which must be avoided at all costs.
The most convenient way of configuring non-IP networks is to use
Disk Heartbeating as it removes theproblems of distance with rs232
serial networks. Disk heartbeat networks only require a small disk
orLUN. Be careful not to put application data on these disks.
Although, it is possible to do so, you don't wantany conflict with
the disk heartbeat mechanism!
Adapters
As stated above, each network defined to HACMP should have at
least two adapters per node. While it ispossible to build a cluster
with fewer, the reaction to adapter failures is more severe: the
resource groupmust be moved to another node. AIX provides support
for Etherchannel, a facility that can used to aggregateadapters
(increase bandwidth) and provide network resilience. Etherchannel
is particularly useful forfast responses to adapter / switch
failures. This must be set up with some care in an HACMP
cluster.When done properly, this provides the highest level of
availability against adapter failure. Refer to the IBMtechdocs
website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785
for furtherdetails.Many System p TM servers contain built-in
Ethernet adapters. If the nodes are physically close together, itis
possible to use the built-in Ethernet adapters on two nodes and a
"cross-over" Ethernet cable (sometimesreferred to as a "data
transfer" cable) to build an inexpensive Ethernet network between
two nodes forheart beating. Note that this is not a substitute for
a non-IP network.Some adapters provide multiple ports. One port on
such an adapter should not be used to back up anotherport on that
adapter, since the adapter card itself is a common point of
failure. The same thing is trueof the built-in Ethernet adapters in
most System p servers and currently available blades: the ports
have acommon adapter. When the built-in Ethernet adapter can be
used, best practice is to provide an additionaladapter in the node,
with the two backing up each other.Be aware of network detection
settings for the cluster and consider tuning these values. In HACMP
terms,these are referred to as NIM values. There are four settings
per network type which can be used : slow,normal, fast and custom.
With the default setting of normal for a standard Ethernet network,
the networkfailure detection time would be approximately 20
seconds. With todays switched network technology thisis a large
amount of time. By switching to a fast setting the detection time
would be reduced by 50% (10seconds) which in most cases would be
more acceptable. Be careful however, when using custom settings,as
setting these values too low can cause false takeovers to occur.
These settings can be viewed using a varietyof techniques including
: lssrc ls topsvcs command (from a node which is active) or
odmgetHACMPnim |grep p ether and smitty hacmp.
ApplicationsThe most important part of making an application run
well in an HACMP cluster is understanding theapplication's
requirements. This is particularly important when designing the
Resource Group policy behaviorand dependencies. For high
availability to be achieved, the application must have the ability
tostop and start cleanly and not explicitly prompt for interactive
input. Some applications tend to bond to aparticular OS
characteristic such as a uname, serial number or IP address. In
most situations, these problemscan be overcome. The vast majority
of commercial software products which run under AIX are wellsuited
to be clustered with HACMP.
Application Data LocationWhere should application binaries and
configuration data reside? There are many arguments to this
discussion.Generally, keep all the application binaries and data
were possible on the shared disk, as it is easyto forget to update
it on all cluster nodes when it changes. This can prevent the
application from starting orworking correctly, when it is run on a
backup node. However, the correct answer is not fixed. Many
applicationvendors have suggestions on how to set up the
applications in a cluster, but these are recommendations.Just when
it seems to be clear cut as to how to implement an application,
someone thinks of a newset of circumstances. Here are some rules of
thumb:If the application is packaged in LPP format, it is usually
installed on the local file systems in rootvg. Thisbehavior can be
overcome, by bffcreateing the packages to disk and restoring them
with the preview option.This action will show the install paths,
then symbolic links can be created prior to install which pointto
the shared storage area. If the application is to be used on
multiple nodes with different data or configuration,then the
application and configuration data would probably be on local disks
and the data sets onshared disk with application scripts altering
the configuration files during fallover. Also, remember theHACMP
File Collections facility can be used to keep the relevant
configuration files in sync across the cluster.This is particularly
useful for applications which are installed locally.
Start/Stop ScriptsApplication start scripts should not assume
the status of the environment. Intelligent programming
shouldcorrect any irregular conditions that may occur. The cluster
manager spawns theses scripts off in a separatejob in the
background and carries on processing. Some things a start script
should do are: First, check that the application is not currently
running! This is especially crucial for v5.4 users asresource
groups can be placed into an unmanaged state (forced down action,
in previous versions).Using the default startup options, HACMP will
rerun the application start script which may causeproblems if the
application is actually running. A simple and effective solution is
to check the stateof the application on startup. If the application
is found to be running just simply end the start scriptwith exit
0.Verify the environment. Are all the disks, file systems, and IP
labels available?If different commands are to be run on different
nodes, store the executing HOSTNAME to variable.Check the state of
the data. Does it require recovery? Always assume the data is in an
unknown statesince the conditions that occurred to cause the
takeover cannot be assumed.Are there prerequisite services that
must be running? Is it feasible to start all prerequisite
servicesfrom within the start script? Is there an inter-resource
group dependency or resource group sequencingthat can guarantee the
previous resource group has started correctly? HACMP v5.2 and later
hasfacilities to implement checks on resource group dependencies
including collocation rules inHACMP v5.3.Finally, when the
environment looks right, start the application. If the environment
is not correct anderror recovery procedures cannot fix the problem,
ensure there are adequate alerts (email, SMS,SMTP traps etc) sent
out via the network to the appropriate support administrators.Stop
scripts are different from start scripts in that most applications
have a documented start-up routineand not necessarily a stop
routine. The assumption is once the application is started why stop
it? Relyingon a failure of a node to stop an application will be
effective, but to use some of the more advanced featuresof HACMP
the requirement exists to stop an application cleanly. Some of the
issues to avoid are:
Be sure to terminate any child or spawned processes that may be
using the disk resources. Considerimplementing child resource
groups.Verify that the application is stopped to the point that the
file system is free to be unmounted. Thefuser command may be used
to verify that the file system is free.In some cases it may be
necessary to double check that the application vendors stop script
did actuallystop all the processes, and occasionally it may be
necessary to forcibly terminate some processes.Clearly the goal is
to return the machine to the state it was in before the application
start script was run.Failure to exit the stop script with a zero
return code as this will stop cluster processing. * Note: This is
not the case with start scripts!Remember, most vendor stop/starts
scripts are not designed to be cluster proof! A useful tip is to
have stopand start script verbosely output using the same format to
the /tmp/hacmp.out file. This can be achievedby including the
following line in the header of the script: set -x &&
PS4="${0##*/}"'[$LINENO]
AIX Security Checklist
AIX Security Checklist
AIX Environment Procedures
The best way to approach this portion of the checklist is to do
a comprehensive physical inventory of the servers. Serial numbers
and physical location would be sufficient.
____Record server serial numbers ____Physical location of the
servers
Next we want to gather a rather comprehensive list of both the
AIX and pseries inventories. By running these next 4 scripts we can
gather the information for analyze.
____Run these 4 scripts: sysinfo, tcpchk, nfsck and nethwchk.
(See Appendix A for scripts) ____sysinfo: ____Determine active
logical volume groups on the servers: lsvg -o ____List physical
volumes in each volume group: lsvg p "vgname" ____List logical
volumes for each volume group: lsvg l "vgname" ____List physical
volumes information for each hard disk ____lspv hdiskx ____lspv p
hdiskx ____lspv l hdiskx ____List server software inventory: lslpp
-L ____List server software history: lslpp h ____List all hardware
attached to the server: lsdev C | sort d ____List system name,
nodename, LAN network number, AIX release, AIX version and machine
ID: uname x ____List all system resources on the server: lssrc a
____List inetd services: lssrc t 'service name' p 'process id'
____List all host entries on the servers: hostent -S ____Name all
nameservers the servers have access to: namerslv Is ____Show status
of all configured interfaces on the server: netstat i ____Show
network addresses and routing tables: netstat nr ____Show interface
settings: ifconfig ____Check user and group system variables
____Check users: usrck t ALL ____Check groups: grpck t ALL ____Run
tcbck to verify if it is enabled: tcbck ____Examine the AIX failed
logins: who s /etc/security/failedlogin ____Examine the AIX user
log: who /var/adm/wtmp ____Examine the processes from users logged
into the servers: who p /var/adm/wtmp ____List all user attributes:
lsuser ALL | sort d ____List all group attributes: lsgroup ALL
____tcpchk: ____Confirm the tcp subsystem installed: lslpp l | grep
bos.net ____Determine if it is running: lssrc g tcpip ____Search
for .rhosts and .netrc files: find / -name .rhosts -print ; find /
-name .netrc print ____Checks for rsh functionality on host: cat
/etc/hosts.equiv ____Checks for remote printing capability: cat
/etc/hosts.lpd | grep v # ____nfschk: ____Verify NFS is installed:
lslpp -L | bin/grep nfs ____Check NFS/NIS status: lssrc -g nfs |
bin/grep active ____Checks to see if it is an NFS server and what
directories are exported: cat /etc/xtab ____Show hosts that export
NFS directories: showmount ____Show what directories are exported:
showmount e ____nethwchk ____Show network interfaces that are
connected: lsdev Cc if ____Display active connection on boot:
odmget -q value=up CuAt | grep name|cut -c10-12 ___Show all
interface status: ifconfig ALL
Root level access
____Limit users who can su to another UID: lsuser f ALL
____Audit the sulog: cat /var/adm/sulog ____Verify /etc/profile
does not include current directory ____Lock down cron access ____To
allow root only: rm i /var/adm/cron/cron.deny and rm I
/var/adm/cron/cron.allow ____To allow all users: touch cron.allow
(if file does not already exist) ____To allow a user access: touch
/var/adm/cron/cron.allow then echo
"UID">/var/adm/cron/cron.allow ____To deny a user access: touch
/var/adm/cron/cron.deny then echo "UID">/var/adm/cron/cron.deny
____Disable direct herald root access: add rlogin=false to root in
/etc/security/user file or through smit
____Limit the $PATH variable in /etc/environment. Use the users
.profile instead.
Authorization/authentication administration
____Report all password inconsistencies and not fix them: pwdck
n ALL ____Report all password inconsistencies and fix them: pwdck y
ALL ____Report all group inconsistencies and not fix them: grpck n
ALL ____Report all group inconsistencies and fix them: grpck y ALL
____Browse the /etc/shadow, etc/password and /etc/group file
weekly
SUID/SGID
____Review all SUID/SGID programs owned by root, daemon, and
bin. ____Review all SETUID programs: find / -perm -1000 print
____Review all SETGID programs: find / -perm -2000 print ____Review
all sticky bit programs: find / -perm -3000 print ____Set user
.profile in /etc/security/.profile
Permissions structures
____System directories should have 755 permissions at a minimum
____Root system directories should be owned by root ____Use the
sticky bit on the /tmp and /usr/tmp directories. ____Run checksum
(md5) against all /bin, /usr/bin, /dev and /usr/sbin files.
____Check device file permissions: ____disk, storage, tape, network
(should be 600) owned by root. ____tty devices (should be 622)
owned by root. ____/dev/null should be 777. ____List all hidden
files in there directories ( the .files). ____List all writable
directories (use the find command). ____$HOME directories should be
710 ____$HOME .profile or .login files should be 600 or 640.
____Look for un-owned files on the server: find / -nouser print.
Note: Do not remove any /dev files. ____Do not use r-type commands:
rsh, rlogin, rcp and tftp or .netrc or .rhosts files. ____Change
/etc/host file permissions to 660 and review its contents
weekly.
____Check for both tcp/udp failed connections to the servers:
netstat p tcp; netstat p udp. ____Verify contents of /etc/exports
(NFS export file). ____If using ftp, make this change to the
/etc/inetd.conf file to enable logging. ftp stream tcp6 nowait root
/usr/sbin/ftpd ftpd l ____Set NFS mounts to ro (read only) and only
to the hosts that they are needed. ____Consider using extended
ACL's (please review the tcb man page). ____Before making network
connection collect a full system file listing and store it
off-line: ls -Ra -la>/tmp/allfiles.system ____Make use of the
strings command to check on files: strings /etc/hosts | grep
Kashmir
Recommendations
Remove unnecessary services
By default the Unix operating system gives us 1024 services to
connect to, we want to parse this down to a more manageable value.
There are 2 files in particular that we want to parse. The first is
the /etc/services file itself. A good starting point is to
eliminate all unneeded services and add services as you need them.
Below is a screenshot of an existing ntp server etc/services file
on one of my lab servers.
# # Network services, Internet style # ssh 22/udp ssh 22/tcp
mail auth 113/tcp authentication sftp 115/tcp ntp 123/tcp # Network
Time Protocol ntp 123/udp # Network Time Protocol # # UNIX specific
services # login 513/tcp shell 514/tcp cmd # no passwords used
Parse /etc/rc.tcpip file
This file starts the daemons that we will be using for the
tcp/ip stack on AIX servers. By default the file will start the
sendmail, snmp and other daemons. We want to parse this to reflect
what functionality we need this server for. Here is the example for
my ntp server.
# Start up the daemons # echo "Starting tcpip daemons:" trap
'echo "Finished starting tcpip daemons."' 0 # Start up syslog
daemon (for error and event logging) start /usr/sbin/syslogd
"$src_running"
# Start up Portmapper
start /usr/sbin/portmap "$src_running"
# Start up socket-based daemons start /usr/sbin/inetd
"$src_running"
# Start up Network Time Protocol (NTP) daemon start
/usr/sbin/xntpd "$src_running"
This helps also to better understand what processes are running
on the server.
Remove unauthorized /etc/inittab entries
Be aware of what is in the /etc/inittab file on the AIX servers.
This file works like the registry in a Microsoft environment. If an
intruder wants to hide an automated script, he would want it
launched here or in the cron file. Monitor this file closely.
Parse /etc/inetd.conf file
This is the AIX system file that starts system services, like
telnet, ftp, etc. We also want to closely watch this file to see if
there are any services that have been enabled without
authorization. If you are using ssh for example this is what the
inetd.con file should look like. Because we are using other
internet connections, this file is not used in my environment and
should not be of use to you. This is why ssh should be used for all
administrative connections into the environment. It provides an
encrypted tunnel so connection traffic is secure. In the case of
telnet, it is very trivial to sniff the UID and password.
## protocol. "tcp" and "udp" are interpreted as IPv4. ## ##
service socket protocol wait/ user server server program ## name
type nowait program arguments ##
Edit /etc/rc.net
This is network configuration file used by AIX. This is the file
you use to set your default network route along your no (for
network options) attributes. Because the servers will not be used
as routers to forward traffic and we do not want to use loose
source routing at you, we will be making a few changes in this
file. A lot of them are to protect from DOS and DDOS attacks from
the internet. Also protects from ACK and SYN attacks on the
internal network.
##################################################################
##################################################################
# Changes made on 06/07/02 to tighten up socket states on this
# server.
##################################################################
if [ -f /usr/sbin/no ] ; then /usr/sbin/no -o udp_pmtu_discover=0 #
stops autodiscovery of MTU /usr/sbin/no -o tcp_pmtu_discover=0 # on
the network interface /usr/sbin/no -o clean_partial_conns=1 #
clears incomplete 3-way conn. /usr/sbin/no -o bcastping=0 #
protects against smurf icmp attacks /usr/sbin/no -o
directed_broadcast=0 # stops packets to broadcast add. /usr/sbin/no
-o ipignoreredirects=1 # prevents loose /usr/sbin/no -o
ipsendredirects=0 # source routing /usr/sbin/no -o ipsrcrouterecv=0
# attacks on /usr/sbin/no -o ipsrcrouteforward=0 # our network
/usr/sbin/no -o ip6srcrouteforward=0 # from using indirect
/usr/sbin/no -o icmpaddressmask=0 # dynamic routes /usr/sbin/no -o
nonlocsrcroute=0 # to attack us from /usr/sbin/no -o ipforwarding=0
# Stops server from acting like a router fi
Securing root
Change the /etc/motd banner
This computer system is the private property of XYZ Insurance.
It is for authorized use only. All users (authorized or
non-authorized) have no explicit or implicit expectations of
privacy.
Any or all users of this system and all the files on this system
may be intercepted, monitored, recorded, copied, audited, inspected
and disclosed to XYZ Insurance's management personnel.
By using this system, the end user consents to such
interception, monitoring, recording, copying, auditing, inspection
and disclosure at the discretion of such personnel. Unauthorized or
improper use of this system may result in civil and/or criminal
penalities and administrative or disciplinary action, as deemed
appropriate by said actions. By continuing to use this system, the
individual indicates his/her awareness of and consent to these
terms and conditions of use.
LOG OFF IMMEDIATELY if you do not agree to the provisions stated
in this warning banner.
Modify /etc/security/user
root: loginretries = 5 failed retries until account locks rlogin
= false Disables remote herald access to a root shell. Need to su
from another UID. admgroups = system minage = 0 minimum aging is no
time value maxage = 4 maximum aging is set to 30 days or 4 weeks
umask = 22
Tighten up /etc/security/limits
This is an attribute that should be changed due to a runaway
resource hog. This orphaned process can grow to use an exorbinate
amount of disk space. To provent this we can set the ulimit value
here.
default: #fsize = 2097151 fsize = 8388604 sets the soft file
block size to a max of 8 Gig.
Variable changes in /etc/profile
Set the $TMOUT variable in /etc/profile. This will cause an open
shell to close after 15 minutes of inactivity. It works in
conjunction with the screensaver, to prevent an open session to be
used to either delete the server or worse corrupt data on the
server.
# Automatic logout, include in export line if uncommented
TMOUT=900
4.6.5 Sudo is your friend.
This is a nice piece of code that the system administrators can
use in order to allow "root-like" functionality. It allows a
non-root user to run system binaries or commands. The /etc/sudoers
file is used to configure exactly what the user can do. The service
is configured and running on ufxcpidev. The developers are running
a script called changeperms in order to tag there .ear files with
there own ownership attributes.
First we setup sudo to allow root-like or superuser doer access
to sxnair.
# sudoers file. # # This file MUST be edited with the 'visudo'
command as root. # # See the sudoers man page for the details on
how to write a sudoers file. # # Host alias specification
# User alias specification
# Cmnd alias specification
# User privilege specification root ALL=(ALL) ALL
sxnair,jblade,vnaidu ufxcpidev=/bin/chown *
/usr/WebSphere/AppServer/installedApps/* # # # Override the built
in default settings Defaults syslog=auth
Defaults logfile=/var/log/sudo.log
For more details, please see the XYZ Company Insurance Work
Report that I compiled, or visit this URL:
http://www.courtesan.com/sudo/.
Tighten user/group attributes
Change /etc/security/user
These are some of the changes to the /etc/security/user file
that will promote a more heightened configuration of default user
attributes at your company.
default:
umask = 077 defines umask values 22 is readable only for that
UID pwdwarntime = 7 days of password expiration warnings
loginretries = 5 failed login attempts before account is locked
histexpire = 52 defines how long a password cannot be re-used
histsize = 20 defines how many previous passwords the system
remembers minage = 2 minimum number of weeks a password is valid
maxage = 8 maximum number of weeks a password is valid maxexpired =
4 maximum time in weeks a password can be changed after it exp
HACMP log files
/usr/sbin/cluster/etc/rhosts --- to accept incoming
communication from clcomdES (cluster communucation enahanced
security)/usr/es/sbin/cluster/etc/rhosts
Note: If there is an unresolvable label in the
/usr/es/sbin/cluster/etc/rhosts file, then all clcomdES connections
from remote nodes will be denied.
cluster manager clstrmgrEScluster lock Daemon (clockdES)cluster
multi peer extension communication daemon (clsmuxpdES)
The clcomdES is used for cluster configuration operations such
as cluster synchronisationcluster management (C-SPoC) * Dynamic
re-configuration DARE configuration. (DARE ) operation.
For clcomdES there should be atleast 20 MB free space in /var
file system./var/hacmp/clcomd/clcomd.log --it requires 2
MB/var/hacmp/clcomd/clcomdiag.log --it requires 18MBAdditional 1 MB
required for /var/hacmp/odmcache directory
clverfify.log also present in /var directory
/var/hacmp/clverify/current//* contains log for mcurrent execution
of clverify/var/hacmp/clverify/pass//* contains logs from the last
passed verification/var/hacmp/clverify/pass.prev//* contains log
from the second last passed verification