Top Banner
Best Practices of Oracle 10g/11g Clusterware: Configuration, Administration and Troubleshooting Kai Yu Dell Inc
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Best Practices Oracle Cluster Ware Session 355 Ppt

Best Practices of Oracle

10g/11g Clusterware:

Configuration, Administration

and Troubleshooting

Kai Yu

Dell Inc

Page 2: Best Practices Oracle Cluster Ware Session 355 Ppt

About Author:

Kai Yu

• Senior System Consultant in Dell Oracle Solutions

Engineering lab: [email protected], 512-725-0046

• Oracle Database/Applications DBA since 1995

• IOUG Collaborate 09 Committee Member and RAC SIG

US Event Chair

• Frequent Presenter and Author for IOUG SELECT

Journal, Dell Power Solutions, OOW06/07/08 and

Collaborate 08/09, RAC SIG Web seminar

Page 3: Best Practices Oracle Cluster Ware Session 355 Ppt

Agenda

Oracle Clusterware Architecture

Shared Storage Configuration

Network Configuration

Managing Oracle Clusterware

Oracle Clusterware Troubleshooting

QA

Page 4: Best Practices Oracle Cluster Ware Session 355 Ppt

Oracle Clusterware Architecture

Introduction to Oracle Clusterware– Clusterware role: manage cluster resources

– Components: OCR, voting disk, Interconnect,

Clusterware processes

Page 5: Best Practices Oracle Cluster Ware Session 355 Ppt

Oracle Clusterware Architecture

Oracle clusterware components– Voting disk stores cluster membership, 3 copies

– OCR stores information about clusterware resources ,

multiplexed OCR for high availability

– Clusterware processes:

• CRS manages cluster resourcesroot 9204 1 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run

root 10058 9204 0 06:03 ? 00:01:23 /crs/product/11.1.0/crs/bin/crsd.bin reboot

• CSSD manages the node membership through heartbeats and votingdisk, notify the membership

status changes:root 9198 1 0 06:03 ? 00:00:22 /bin/sh /etc/init.d/init.cssd fatal

root 10095 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd

root 10114 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon

root 10151 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemon

oracle 10566 10151 0 06:03 ? 00:00:40 /crs/product//11.1.0/crs/bin/ocssd.bin

• Event manager(EVM) publishes events through ONS;

communicate between CRS and CSS

Page 6: Best Practices Oracle Cluster Ware Session 355 Ppt

Oracle Clusterware Architecture

root 9196 1 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run

root 10059 9196 0 06:03 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c

unlimited; cd /crs /product/11.1.0/crs/log/kblade2/evmd;

exec /crs/product/11.1.0/crs/bin/evmd

oracle 10060 10059 0 06:03 ? 00:00:02 /crs/product/11.1.0/crs/bin/evmd.bin

OPROCD processor monitor monitors the cluster

. Locked in memory

. Introduced in 10.2.0.4 , replace hangcheck timer

. Reboot cluster if failedroot 9198 1 0 06:03 ? 00:00:22 /bin/sh /etc/init.d/init.cssd fatal

root 10095 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd

root 10465 10095 0 06:03 ? 00:00:00 /crs/product/11.1.0/crs/bin/oprocd run -t 1000 -m 500 -f

. Other processes:

RACG: oracle 12039 1 0 06:06 ? 00:00:00 /opt/oracle/product/11.1.0/asm/bin/racgimon

daemon ora.kblade2.ASM2.asm

oracle 12125 1 0 06:06 ? 00:00:06 /opt/oracle/product/11.1.0/db_1/bin/racgimon startd

test1db

ONS: Oracle Notification Services oracle 12063 1 0 06:06 ? 00:00:00 /crs/oracle/product/11.1.0/crs/opmn/bin/ons -d

oracle 12064 12063 0 06:06 ? 00:00:00 /crs/oracle/product/11.1.0/crs/opmn/bin/ons -d

Page 7: Best Practices Oracle Cluster Ware Session 355 Ppt

Oracle Clusterware Architecture

Hardware Configuration of Oracle Clusterware– Servers, shared storage, interconnect

– Two interconnect switches for redundant interconnects

– Butterfly connections to shared storage:

Servers <-> IO Switches <->SAN storage

Page 8: Best Practices Oracle Cluster Ware Session 355 Ppt

Shared Storage Configuration

Storage Requirement:– Shared storage for OCR and voting disk

– Types: block devices, RAW devices, OCFS/OCFS2, NFS on certified NAS (Oracle Storage Compatibility Program list)

– HA requirement for the shared storage

Physical connections to shared SAN storage

– Fully Redundant active-active IO paths: for HA and IO Load balancing

FC storage: dual HBAs and dual FC switches: each server has two independent paths to both storage processors

Page 9: Best Practices Oracle Cluster Ware Session 355 Ppt

Shared Storage Configuration

Fully redundant IO paths for iSCSI storage

Multiple NIC cards of each server; two Gigabit Ethernet

switches. On each storage control module, one network

interface connects to one switch and other two network

interfaces connects to other switch

Page 10: Best Practices Oracle Cluster Ware Session 355 Ppt

Shared Storage Configuration

Multipath Devices of the Shared Storage– Multipathing device driver to combine multiple IO paths

– Linux native Device Mapper (DM) , or storage vendor such as EMC PowerPath driver

– Example of configuring Linux Device Mapper (DM)

• Verify : rpm –qa | grep device-mapper

• Find the unique SCSI ID of the device: $/sbin/scsi_id -gus /block/sdb

36090a028e093fc906099540639aa2149

$/sbin/scsi_id -gus /block/sde

36090a028e093fc90609954063g9aa2149

• Configure multipathing in /etc/multipath.confmultipath {

wwid 36090a028e093fc906099540639aa2149 #<---- for sdb and sde

alias votingdisk1

}

……

• service multipathd restart

Page 11: Best Practices Oracle Cluster Ware Session 355 Ppt

Shared Storage Configuration

• Verify the multipath devices: multipath -ll

ls -lt /dev/mapper/* brw-rw---- 1 root disk 253, 8 Feb 18 02:02 /dev/mapper/votingdisk1

– EMC PowerPath driver for EMC storage:

• Install EMC PowerPath and Naviagent software:rpm –ivh EMCpower.LINUX-5.1.2.00.00-021.rhel5.x86_64.rpm

rpm –ivh naviagentcli-6.24.2.5.0-1.noarch.rpm

• Start naviagent agent and PowerPath daemons:service naviagent start , service PowerPath start

• Verify EMC pseudo devices in /proc/partitions:120 32 419430400 emcpowerc

• powermt display dev=emcpowercPseudo name=emcpowerc

==============================================================================

---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---

### HW Path I/O Paths Interf. Mode State Q-IOs Errors

==============================================================================

2 lpfc sdc SP B1 active alive 0 0

2 lpfc sdh SP A0 active alive 0 0

Page 12: Best Practices Oracle Cluster Ware Session 355 Ppt

Shared Storage Configuration

Block Devices vs Raw Devices– For RHEL4: use raw devices for 10g, block devices for 11g

– For RHEL 5: raw devices are depreciated:

• 11g clusterware: use block devices:

set the proper ownerships and permissions in /etc/rc.local file

chown root:oinstall /dev/mapper/ocr*

chmod 0640 /dev/mapper/ocr*

chown oracle:oinstall /dev/mapper/voting*

chmod 0640 /dev/mapper/voting*

• Two options for 10g RAC

a. Use 11g clusterware for 10g RAC

b. Map block devices to raw devices by udev rules:/etc/udev/rules.d/65-rasw.rules:

ACTION=="add", KERNEL=="emcpowera1", RUN+="/bin/raw /dev/raw/raw1 %N"

/etc/udev/rules.d/89-raw_permissions.rules:

KERNEL=="raw1",OWNER="root", GROUP="oinstall", MODE="640"

c. Start udev: /sbin/udev

Page 13: Best Practices Oracle Cluster Ware Session 355 Ppt

Network Configuration

Public IP and Virtual IP configuration– Virtual IP for fast database connection failover

• Avoid a up to 10 minutes TCP/IP timeout

• Automatically failover to other node$srvctl status nodeapps -n kblade1

VIP is running on node: kblade1

GSD is running on node: kblade1

Listener is running on node: kblade1

ONS daemon is running on node: kblade1

When kblade1 fails, kblade1-vip is failed over to kblade2:$ srvctl status nodeapps -n kblade1

VIP is running on node: kblade2

GSD is not running on node: kblade1

Listener is not running on node: kblade1

ONS daemon is not running on node: kblade1

$ ping kblade1-vip

PING 155.16.9.171 (155.16.9.171) 56(84) bytes of data.

From 155.16.0.1 icmp_seq=9 Destination Host Unreachable

From 155.16.0.1 icmp_seq=9 Destination Host Unreachable

….. (waiting for 2 seconds before being failed over)

64 bytes from 155.16.9.171: icmp_seq=32 ttl=64 time=2257 ms

64 bytes from 155.16.9.171: icmp_seq=33 ttl=64 time=1258 ms

Page 14: Best Practices Oracle Cluster Ware Session 355 Ppt

Network Configuration

Private Interconnection Configuration– Fully Redundant Ethernet Interconnects:

Two NIC cards, two non-roundtable interconnect switches

– NIC teaming to bond two network interfaces for failoverifcfg-eth1: ifcfg-eth2: ifcfg-bond0:DEVICE=eth1 DEVICE=eth2

USERCTL=no USERCTL=no IPADDR=192.168.9.52

ONBOOT=yes ONBOOT=yes NETMASK=255.255.255.0

MASTER=bond0 MASTER=bond0 ONBOOT=yes

SLAVE=yes SLAVE=yes BOOTPROTO=none

BOOTPROTO=none BOOTPROTO=none USERCTL=no

TYPE=Ethernet TYPE=Ethernet

Add: alias bond0 bonding

options bonding miimon=100 mode=1 in /etc/modproble.conf

Page 15: Best Practices Oracle Cluster Ware Session 355 Ppt

Network Configuration

Private Interconnection Configuration– Configuration best practices from Oracle. Refer to [7]

• Set UDP send/receive buffers to max

• Use the same interconnect for both Oracle clusterware and Oracle RAC communication

• NIC settings for interconnect: 1. Define control : rx=on, tx=off

2. Ensure NIC names/slots order identical on all nodes:

3. Configure interconnect NICs on fastest PCI bus

4. Compatible switch settings:

802.3ad on NICs = 802.3ad on switch ports

MTU=9000 on NICs = MTU=9000 on switch ports

• Recommended Linux kernel configuration for networking:

11gR1 10gR2

net.core.rmem_default = 4194304 262144

net.core.rmem_max = 4194304 262144 net.core.wmem_default = 262144 262144 net.core.wmem_max = 262144 262144

• Network Heartbeat Misscount: 60 secs for 10g, 30 secs for 11g

• Hangcheck timer value:

modprobe hangcheck-timer hangcheck_tick=1

hangcheck_margin=10 hangcheck_reboot=1

Page 16: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Managing voting disk– Locate voting disk: crsctl query votedisk css

– Backup/restore using dd command for example:

dd if=/dev/mapper/votingdisk1p1 of=/backup/vd bs=4096

dd if=/backup/vd of=/dev/mapper/votingdisk1p1 bs=4096

– Adding/removing voting disk:

crsctl add css votedisk /dev/mapper/votingdisk3p1 –force

crsctl delete css votedisk /dev/mapper/votingdisk2p1 –force

Managing OCR– Three tools: OCRCONFIG, OCRDUMP and OCRCHECK

– Add mirror: ocrconfig –replace ocrmirror /dev/mapper/ocr2p1

– replace OCR: ocrconfig –replace ocr /u01/ocr/ocr1

– OCR backup: Clusterware automatically backs up OCR

show backup: ocrconfig –showbackup

– Restore OCR: ocrconfig –restore file_name

– Manual backup: ocrconfig –manualbackup

– OCR export/import:

ocrconfig –export /home/oracle/ocr_export

ocrconfig –import /home/oracle/ocr_export

cluvfy comp ocr

Page 17: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Extending Cluster by Cloning Oracle Clusterware – Task:

• Existing cluster: k52950-3-n1 and k52950-3-n2

• Add new node: k52950-3-n3

– Step 1: Pre-requisite tasks on new node k52950-3-n3 :

OS install, share storage for OCR and voting disk

Network configuration: public, private and VIP

– Step 2:

• Create CRS home backup on the source node k52950-3-n1

• Copy CRS home backup to the new node

• Shutdown CRS in the source node

• Remove all the log files and trace files from the backup

• Create oracle inventory in the new node

• Set ownership for oracle inventory

• Run preupdate.sh in the new node.

– Step3: Run CRS clone process in the new node:

Page 18: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Page 19: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Page 20: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Execute ./root.sh as root on new node k52950-3-n3 as instructed

Step 4: Run addNode.sh on the source node k52950-3-n1:

Page 21: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Start CRS on node1 k52950-3-n1 and

execute rootaddnode.sh on the source node k52950-3-n1

Page 22: Best Practices Oracle Cluster Ware Session 355 Ppt

Managing Oracle clusterware

Execute root.sh on the new node:

Restart CRS on node 2: k52950-3-n2:

[root@k52950-3-n2 bin]# ./crsctl start crs

Page 23: Best Practices Oracle Cluster Ware Session 355 Ppt

Clusterware Troubleshooting

Split Brain Condition and IO Fencing Mechanism• Split brain condition: a node failure partitions the cluster into

multiple sub-clusters without knowledge of the existence of others.

• Consequence: data collision and corruption

• IO fencing: fencing the failed node off from all the IOs: STOMITH

• Node eviction: pick a cluster node as victim to reboot.

Always keep the largest cluster possible up, evicted other nodes

two nodes: keep the lowest number node up and evict other

• Two CSS heartbeats and misscounts to detect node eviction

• Network heartbeat (NHB) : cross the private interconnect

establish and confirm valid node membership

CSS misscount:

the maximal # of seconds

complete heartbeat

without trigging a node

eviction

• Disk heartbeat : between the cluster node and voting disk

CSS misscount, the default is 200 seconds for 10.2.0.1 and up

Page 24: Best Practices Oracle Cluster Ware Session 355 Ppt

Clusterware Troubleshooting

Node Eviction Diagnosis Case Study – Random node evicted in a 11-node 10g cluster on Linux:

/var/log/messages: Jul 23 11:15:23 racdb7 logger: Oracle clsomon failed with fatal status 12.

Jul 23 11:15:23 racdb7 logger: Oracle CSSD failure 134.

Jul 23 11:15:23 racdb7 logger: Oracle CRS failure. Rebooting for cluster integrity

OCSSD log: $CRS_HOME/log/<hostname>/cssd/ocssd.log file[ CSSD]2008-07-23 11:14:49.150 [1199618400] >WARNING:

clssnmPollingThread: node racdb7 (7) at 50% heartbeat fatal, eviction in 29.720 seconds

..

clssnmPollingThread: node racdb7 (7) at 90% heartbeat fatal, eviction in 0.550 seconds

[ CSSD]2008-07-23 11:15:19.079 [1220598112] >TRACE:

clssnmDoSyncUpdate: Terminating node 7, racdb7, misstime(60200) state(3)

– Root cause analysis:

• A network heartbeat failure triggered a node eviction on node 7

• Private IP node not pingable right before the node eviction

• Public and private shared a single physical switch

– Solution : Use two dedicated switches for interconnect

– Result: no more node eviction after the switch change

Page 25: Best Practices Oracle Cluster Ware Session 355 Ppt

Clusterware Troubleshooting

CRS Reboots Troubleshooting Procedure – OCSSD, OPROCD and OCLSOMON monitoring processes

• Detect certain conditions of impacting data integrity

• Kill clusterware and trigger CRS reboot

• Leave critical errors in the related log files

– Troubleshooting Methods:• Review syslog file /var/log/messages in Linux

• OCSSD log ocssd.log node eviction due to internal health

error, interconnect and membership error.

• PROCD log : /etc/oracle/oprocd/<host>.oprocd.log node

eviction due to hardware and driver freeze

• OCLSOMON log oclsmon.log for rebooting due to

hangs/scheduling issues.

– Troubleshooting Flowchart

Page 26: Best Practices Oracle Cluster Ware Session 355 Ppt

Clusterware Troubleshooting

Page 27: Best Practices Oracle Cluster Ware Session 355 Ppt

Clusterware Troubleshooting

RAC Diagnostic Tools:– Diagwait:

• Delay the node reboot for a short time to write all

diagnostic messages to the logs.

• Doesn’t increase of probability of data corruption

• Setup steps: shutdown CRS

crsctl set css diagwait 13 –force

restart CRS

– Oracle problem detections tool(IPD/OS)• Monitor and record resources degradation and failure

related to oracle clusterware and Oracle RAC issue

• Historical mode goes back to the time before node eviction

• Linux x86 kernel 2.6.9 up.

– RAC-RDDT and Oswatcher

• Collect information leading up to the time of reboot.

• From OS utilities: netstat, iostat, vmstat

• Start: ./startOSW.sh 60 10

• Metalink #301138.1, #301137.1

Page 28: Best Practices Oracle Cluster Ware Session 355 Ppt

References

[1] Oracle 10g Grid Real Applications Clusters Oracle 10g Grid Computing

with RAC, Mike Ault, Madhu Tumma

[2] Oracle Metalink Note: # 294430.1, CSS Timeout Computation in Oracle

Clusterware

[3] Oracle Metalink Note: # 265769.1, Troubleshooting CRS Reboots

[4] Oracle Clusterware Administration and Deployment Guide 11g Release 1

(11.1) September 2007

[5] Deploying Oracle Database 11g R1 Enterprise Edition Real Application

Clusters with Red Hat Enterprise Linux 5.1 and Oracle Enterprise

Linux 5.1 On Dell PowerEdge Servers, Dell/EMC Storage, Kai Yu

http://www.dell.com/downloads/global/solutions/11gr1_ee_rac_on_rhel5_1__and_

OEL.pdf?c=us&cs=555&l=en&s=biz

[6] Dell | Oracle Supported configurations: Oracle Enterprise Linux 5.2

Oracle 11g Enterprise Edition Deployment Guide:

http://www.dell.com/content/topics/global.aspx/alliances/en/oracle_11g_oracle_ent

_linux_4_1?c=us&cs=555&l=en&s=biz

[7] Oracle Real Applications Clusters Internals, Oracle Openworld 2008

presentation #298713, Barb Lundhild

[8] Looking Under the hood at Oracle Clusterware, Oracle Openworld 2008

presentation #299963, Murali Vallath

Page 29: Best Practices Oracle Cluster Ware Session 355 Ppt

Q/A

Page 30: Best Practices Oracle Cluster Ware Session 355 Ppt

IOUG Session Survey

Session Title:

Best Practices of Oracle 10g/11g Clusterware:

Configuration, Administration and Troubleshooting

Session Speaker:

Kai Yu

Session Number:

355