This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
* Other brands and names may be claimed as the property of others.
Intel, the Intel logo and Itanium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
What are the total business consequences of an outage?
• Tarnished company reputation and customer loyalty• Lost opportunities and revenue• Idle or unproductive labor• Cost of restoration• Penalties• Litigation• Loss of stock valuation• Loss of critical data
• Single Cluster up to 16 nodes– PA-RISC (9000) and Integrity servers
• For use when all nodes are in a single Data Center• Automatic failover
– up to 150 application packages (up to 900 services total); – supports up to 200 relocatable package IP addresses per cluster
• Cluster File System (CFS) support in SG versions >= 11.17 (heartbeat must be over Ethernet)• SCSI or Fibre Channel for disks• Single IP subnet for each heartbeat network (IPv4)
– Multiple heartbeat networks required (2 or more)• Ethernet• Infiniband• FDDI & Token Ring for legacy environments
• Local LAN failover and Auto-Port Aggregation• IPv6 support for data links only• File Systems and Volume Managers
– Journaled File System (JFS) and Online JFS• Although HFS is supported, it is not recommended for mission critical applications
• Single Cluster up to 16 nodes– 2 to 4 nodes with Proliant servers using SCSI– Up to 16 nodes with Integrity servers or ProLiant servers using FibreChannel
• For use when all nodes are in a single Data Center• Automatic failover
– up to 150 application packages (up to 900 services total) – supports up to 200 relocatable package IP addresses per cluster
• SCSI or Fibre Channel for disks• Single IP subnet for each heartbeat network (IPv4)
– Multiple heartbeat networks required (at least 2)• Ethernet, supporting up to 7 Heartbeat subnets
• Network bonding for automatic network failover• File System and Volume Manager
– reiser, XFS, and ext3 file systems (Journaled file systems)– Logical Volume Manager (LVM and LVM2) that is included in the Linux distribution– RedHat Global File System (GFS)
• Dynamically loadable modules for Serviceguard installation• Quorum Device Required for 2-node clusters, optional for larger clusters
– Quorum Service with up to 16 nodes– Cluster Lock LUN for up to 4 nodes only
• Extended / Campus Cluster is not supported on Linux– md implementation currently does not meet robustness requirements
• Data integrity protections are not as robust with Sistina LVM (Linux)– Manual activation of a volume used by an SG/LX package from another
server can corrupt data– HP-UX volume managers support exclusive activation mode to protect
against inadvertent activation from another server inside or outside of the cluster
• NFS fail over does not include file locks on Linux– Correcting requires Linux kernel changes– NFS v4 plans to support lock fail over and need distribution vendors to
•All systems are physically connected to each disk•Maximum cluster size is 16 nodes•Each application runs on only one host at a time•Hosts can run multiple applications•Failover is possible to any node that is physically connected to the data
PowerRun Attn. Fault Remote
hp server rx5670
App A
PowerRun Attn. Fault Remote
hp server rx5670
App B
PowerRun Attn. Fault Remote
hp server rx5670
App C
PowerRun Attn. Fault Remote
hp server rx5670
App D
PUSH
READY
ALARM
MESSAGE
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a yhp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
• Active / Standby– One or more nodes are reserved for failover use– Upon failover, the applications maintain performance due to spare capacity
• Active / Active– All nodes are running (different) applications– Upon failover, choice of
• Reduced capacity when multiple applications run on the same node• Shutdown less critical applications• Optional use of VSE technologies to guarantee resource entitlements
• Rotating Standby– Upon failover, the standby system becomes the new production system and the
repaired system becomes the new standby system• Active / Active (distributed application)
– All nodes are running an instance of the same application (e.g., RAC)– Depends on shared read/write access to the data– No failover of the application– Upon failure of a node (or instance), the users are sent to the remaining nodes
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a yhp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
A
B
C
D
•Each “sub-cluster” tries to form a cluster and run all of the applications•Two instances of the same application write to the same disks, resulting in data corruption
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
hp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a yhp S t o r a g e W o r k s x p 1 2 0 0 0 d i s k a r r a y
A
B
C
D
ClusterLock
2-nodecluster
Quorum ServiceHighly Available Quorum
Devicethat is not a member of the
clusterwhose quorum is being
satisfied
hpIntegrityrx4640
hpIntegrityrx4640
•Each “sub-cluster” tries to acquire the cluster lock on the cluster lock disk/lock LUN•The algorithm guarantees that only one sub-cluster will get it•One sub-cluster is forced to crash to prevent data corruption
• Alternative quorum arbitration method• Supports up to 50 clusters and maximum of 100 nodes• TCP/IP network connection required
• (Not required to be in the same subnet, although recommended to minimize network delays)
• Stand-alone HP-UX or Linux-based server(s) outside of the Serviceguard cluster whose quorum is being satisfied• Runs as a real-time process• The Quorum Service (QS A.02.00)
•can be configured in a package in a cluster•cannot reside in the same cluster that uses it•do not configure two clusters that use the same Quorum Service package
•Bonding (Linux) or APA (HP-UX) can be used to increase network availability to the Quorum Service
• Serviceguard CFS will be supported with:– HP-UX 11i v2 September 2004 release and greater– Serviceguard 11.17 (Q3/2005)– VERITAS Storage Foundation 4.1 delivered by HP– Both HP 9000 and HP Integrity Server
• At this time, SG/CFS will NOT be supported with:– HP-UX 11i v1– Earlier versions of HP-UX 11i v2– HP-UX 11.0 or earlier
• Workload Manager (WLM) – WLM allows the specification of SLOs for SG packages that may not be
active on the system. Each SLO is conditional on which server the package active
– When a failover or package movement occurs, WLM detects it and enforces the SLO – the package gets the priority and the resources specified
– WLM automatically activates/deactivates TiCAP processors to reduce the performance impact of an application failover in active-active single or multi-site disaster tolerant configurations
– Using WLM on a Pay per use (PPU) basis reduces the cost of active-standby single or multi-site disaster tolerant configurations
Serviceguard extensions for SAP (SGeSAP) and (SGeSAP/LX)
• Integrate SAP R3 with:• Serviceguard
– Toolkit template for easily configuring SAP with Serviceguard (HP-UX and Linux)
– Options to on how to configure the Central Instance (CI) and the Database (DB) servers
• Metrocluster (HP-UX only)– Optional template to create a disaster tolerant architecture for SAP(Configuration example shown in the Metrocluster section of this
• Oracle Real Application Clusters (RAC) 10g is an option to Oracle 10g Enterprise Edition
• Differences from the previous Oracle9i RAC product include:– Clusterware is built-in (formerly known as Cluster Ready Services or
CRS)– Automatic Storage Management (ASM) software can be used to
manage storage for the RAC database– Performance improvements– “Zero downtime” (rolling upgrade) for certain Oracle patches
• NOTE: Oracle allows customers the choice of using the built-in Oracle cluster membership capability or utilizing platform-specific clusterware such as SGeRAC
• HP’s Virtual Server Environment for Linux was named Best Clustering Solution in the LinuxWorld Products Excellence Awards program
• The awards recognize important innovations in Linux and open source technologies
• HP released and demonstrated the first version of VSE for Linux on HP Integrity Superdome• HP gWLM provides the policy engine to allocate virtual server
resources in a Linux operating system• HP Serviceguard for Linux provides the high availability
• The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes:– Early access to pre-release software development platforms– Access to Intel and 3rd party software and testing tools– Training through Intel® Software College and Web events– Technical content and how–to articles– Protected remote access to
easily evaluate and develop software safely and securely on platforms over the Internet
Intel® Early Access Program -Marketing Opportunities and Support
• Extensive marketing and business development opportunities: – Inclusion in online and print versions of the Intel® Developer
Solutions Catalog– Intel quotes to support your PR– Case studies– Access to Intel’s event marketing asset kit– Participation in selected industry events and trade shows
• Support in your development efforts provided through:
– Access to an Intel Account Representative who will act as your primary contact
– Intel® Premier Support for confidential technical support
– 24/7 online support via www.intel.com/software/support
4. What resources are used by the application• Nodes package can run on• Networks required by package• Disk Volume Groups required by package• Services to monitor• User-defined resources to monitor
Protecting against split brain and data corruption
• ServiceGuard uses a “tie-breaker” or Quorum Device to prevent “Split-Brain” of the cluster– Cluster lock disk (HP-UX) or Cluster LUN (SG/LX)
• single cluster lock disk (when all servers are in a single data center)• dual cluster lock disks (when the servers are distributed across two data centers)
– Quorum Server • a (small) server that is outside of the cluster
• Without a tie breaker, split-brain can occur when:
– a network failure splits the cluster into 2 equal halves-OR-
– exactly half of the servers in the cluster fail all at once
• Unless split-brain is prevented, data corruption will occur if the application runs concurrently on both “halves” of the cluster and modifies the same single copy of the data