Page 1
Reduce risk with disaster recovery for
Oracle Fusion Middleware 11g
architectures using HP Continuous Access
EVA and HP Insight Recovery
HP Reference Architectures for Oracle
Technical white paper
Table of contents
Executive summary ............................................................................................................................... 3
Proof-of-concept description (POC) ......................................................................................................... 4 Disclaimer ....................................................................................................................................... 4
Terminology ........................................................................................................................................ 5 Oracle disaster recovery terminology ................................................................................................. 5 HP Continuous Access EVA terminology .............................................................................................. 5 HP Insight software terminology ......................................................................................................... 6
HP BladeSystem reference architecture with Insight Recovery extension overview ........................................ 6
Overview of Insight software: Insight Dynamics, logical servers and Insight Recovery ................................... 7 The logical server ............................................................................................................................. 7 HP Insight Recovery .......................................................................................................................... 8 HP Continuous Access EVA ............................................................................................................... 9
Architecture summary ......................................................................................................................... 11 Solution attributes ........................................................................................................................... 11 HP Insight Recovery solution highlights .............................................................................................. 11 Functionality .................................................................................................................................. 11
SOA disaster recovery logical topology diagram ................................................................................... 12 Why disaster recovery for middleware SOA ..................................................................................... 12 High Availability (HA) .................................................................................................................... 12
POC hardware components ................................................................................................................ 14
POC software components .................................................................................................................. 15
POC version details of BladeSystem infrastructure .................................................................................. 16
Infrastructure diagram ........................................................................................................................ 16
Oracle database and middleware ....................................................................................................... 18 Oracle Fusion Middleware .............................................................................................................. 18 Oracle RAC database .................................................................................................................... 18
POC site configuration ....................................................................................................................... 19
Planning DR groups ........................................................................................................................... 24
Page 2
Installation and configuration of Insight software and Insight Recovery ..................................................... 24
Test case........................................................................................................................................... 26 The application test environment ...................................................................................................... 26
Network and DNS configuration for disaster recovery ............................................................................ 30
Switchover (planned failover) .............................................................................................................. 31 Switch back procedures .................................................................................................................. 32
Best practices .................................................................................................................................... 32
Example of switchover using Insight Recovery during SOA transaction (XA transaction) .............................. 34
Unplanned failover ............................................................................................................................ 43
Comparison with Oracle Data Guard solution for database replication .................................................... 44
Summary .......................................................................................................................................... 45
Appendix A: Documents ..................................................................................................................... 46 EVA documentation ........................................................................................................................ 46 HP Insight software and Insight Recovery documentation ..................................................................... 46 Oracle documentation .................................................................................................................... 46
For more information .......................................................................................................................... 47
Page 3
3
Executive summary
This document describes a disaster recovery solution for Oracle® environments based on synchronous
storage replication (across town) and HP server-edge virtualization that dramatically simplifies remote
site recovery. The solution uses HP Continuous Access EVA software for high performance array-
based replication of the Oracle database as well as the server OS, Java™ Message Service (JMS)
data, transaction log file (TLog), or metadata required for disaster recovery, and Oracle home. In the
event of a complete site failure, HP Insight Recovery (also called Insight Dynamics Recovery
management), when directed to do so by the system administrator, automates transition of the
production environment from the primary site to the standby recovery site leveraging HP Virtual
Connect technology to minimize system reconfiguration and time-to-recovery.
Although virtual machines are supported in this architecture, this paper is focused on a multi-tiered
physical server architecture typical of an Oracle environment that includes the key elements of Oracle
Fusion Middleware (OFMW) architecture: an Oracle Real Application Cluster (RAC) database,
Oracle WebLogic Server middleware, Oracle SOA Suite and Oracle HTTP Server (OHS) web hosts.
The Oracle SOA Suite consists of: Oracle Business Process Execution Language (BPEL) Process
Manager (PM), Mediator, Rules, B2B (Business-to-Business), Human Workflow, and Oracle Business
Activity Monitoring (BAM).
Oracle BPEL is an XML-based language for enabling task sharing across multiple enterprises using a
combination of Web services. BPEL is based on the XML schema, Simple Object Access Protocol
(SOAP), and Web Services Description Language (WSDL).
Oracle BPEL Process Manager provides a framework for easily designing, deploying, monitoring, and
administering processes based on BPEL standards.
Oracle BAM provides a framework for creating dashboards that display real-time data inflow and
creating rules to send alerts under specified conditions.
This Oracle foundation will support custom and packaged applications or Service-Oriented
Architecture (SOA) composite applications in a highly available and flexible way. The exact Oracle
environment is not critical to implementing the recovery scenario, a single instance database or
different middleware configuration would work as well. The basic concepts and design of the
recovery environment would be the same. The point is that there is more than just database content
that needs to be replicated and safeguarded to allow seamless disaster recovery.
This paper discusses how to setup and configure this recovery architecture and provides an
introduction to the various hardware and software components needed. References are provided for
more detailed information.
This white paper is part of a portfolio of information focused on the optimal integration of HP
Converged Infrastructure technologies with Oracle software technologies. As a collection, we refer to
this documentation as the HP reference architectures for Oracle. Additional reference architecture
documentation can be accessed through the HP and Oracle Alliance home page at
www.hp.com/go/oracle.
Target audience: This paper is written for system architects, managers, and others involved in, or with
a need to understand, the definition and deployment of highly available and disaster-tolerant Service-
Oriented Architecture (SOA) computer solutions.
This white paper describes testing performed in March to May 2010.
Page 4
4
Proof-of-concept description (POC)
Oracle applications tend to be among the most mission critical IT services. Customers require a full
range of availability options for these environments. The recommendation here is for a full three-tiered
architecture with database RAC 11gR2 in the back-end, Oracle WebLogic Server and SOA tiered in
the middleware, and web host processing at the front-end. A demonstration application designed by
Oracle is used to validate the data consistency and recovery in our configuration. The purpose of this
proof-of-concept testing was to validate one disaster recovery option that:
1. Leverages the same HP BladeSystem infrastructure and management console used for the single-
site HP Reference Architecture for Oracle Grid.
2. Provides a cost-effective alternative to geographic cluster architectures for the majority of customers
who do not require online access at the secondary site.
3. Provide an ―add-on‖ to the single-site HP Reference Architecture for Oracle Grid without
re-architecting.
The following document link provides more details about the above concepts:
http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA2-4214ENW
To validate the concepts and integration details, a small scale POC configuration was deployed in the
HP Alliances, Performance and Solutions labs to validate the expected functionality of the key
components and identify any Oracle-specific best practices. Testing was completed for:
System level compatibility of technologies, version requirements, etc.
Site failover functionality specific to all tiers of the Oracle environment and business process
workflows
Virtual Connect (VC) functionality of the single-site HP Reference Architecture for Oracle Grid to
add, remove, or replace a server after fail-over
To provide context for the reader, the details of the POC configuration are used throughout this white
paper. It is important to understand that this particular implementation is simply an example used for
expedience. The concepts described here can be scaled to much larger systems and capacity.
Disclaimer
The performance of a disaster recovery system has to be evaluated on a case-by-case basis based on
the needs of the customer. This white paper only provides template demonstration examples. This
proof-of-concept does not include performance characterization because performance issues/
requirements are unique to every application. This white paper is not a step-by-step ―how to‖ or
―install guide‖; rather, it describes a recommended architecture and the key integration points with
Oracle software.
In our work we made every effort to ensure that what we implemented is fully supported by both HP
and Oracle. This white paper is not intended to imply functionality, compatibility or supportability
beyond what is documented in the individual product specifications.
Page 5
5
Terminology
Oracle disaster recovery terminology
Disaster Recovery – The ability to safeguard against natural disasters or unplanned outages at a
production site by having a recovery strategy for failing over applications and data to a
geographically separate standby site.
Oracle Fusion Middleware (OFMW) – A collection of standards-based software products that spans a
range of tools and services from Java EE (enterprise edition) and developer tools, to integration
services, business intelligence, and collaboration. Oracle Fusion Middleware offers complete support
for development, deployment, and management.
Oracle Service-Oriented Architecture (SOA) Suite – A suite with infrastructure components, such as
Oracle Business Process Execution Language (BPEL) Process Manager (PM), with Mediator, Rules,
B2B, and Human Workflow.
Recovery Point Objective (RPO) – The maximum age of the data to be restored in the event of a
disaster. For example, if the user‘s RPO is six hours, RPO is the ability to restore systems back to the
state they were in as of no longer than six hours ago.
Recovery Time Objective (RTO) – The time needed to recover from a disaster—usually determined by
how long you could afford to be without your systems.
Site failover – The process of making the current recovery (standby) site the new primary site
(production), after the primary or production site becomes unexpectedly unavailable (for example,
due to a disaster at the production site).
Site switchover – The process of reversing the roles of the primary site and recovery site. Switchovers
are planned operations performed for periodic validation or planned maintenance on the current
production site. During a switchover, the current standby site becomes the new production site, and
the current production site becomes the new standby site.
Site switchback – The process of reversing the roles of the new production site (old standby) and new
standby site (old production). Switchback is applicable after a previous switchover.
WebLogic Server transaction logs – Since each WebLogic Server instance has a transaction log
that captures information about committed transactions that may not have completed, the transaction
logs enable the WebLogic Server to recover transactions that could not be completed before the
server failed.
HP Continuous Access EVA terminology
Disk Group – A named group of disks selected from all available disks in an array. One or more
virtual disks can be created from a disk group.
Data replication group (DR group) – A logical group of virtual disks in a remote replication
relationship with a corresponding group on another array.
Destination – The virtual disk, DR group, or virtual array disk (at the recovery site) to which I/O is
replicated. See also Source.
Enterprise Virtual Array (EVA) – An HP StorageWorks product that consists of one or more virtual
arrays. See also virtual array.
Fabric – A network of Fibre Channel switches or hubs and other devices.
HP Continuous Access EVA (CA) – A storage-based HP StorageWorks software product that enables
two or more arrays to perform disk-to-disk replication, along with the management user interfaces that
facilitate configuring, monitoring, and maintaining the replicating capabilities of the arrays.
Page 6
6
Management server – A server on which HP StorageWorks Enterprise Virtual Array (EVA)
management software is installed, including HP StorageWorks Command View EVA and HP
StorageWorks Replication Solutions Manager, if used. A dedicated management server runs EVA
management software exclusively.
Present LUN – Process in which the Management Console of the storage makes the LUN or virtual disk
to be presented (made visible) to the World Wide ID (WWID) of the host (DB/middleware) server‘s
QLogic or Emulex HBA.
Source – The virtual disk, DR group, or virtual array (at the primary site) from which I/O is replicated
to the recovery site. See also destination.
XCS – The HP Enterprise Virtual Array software on specific EVA controller models. Controller software
manages all aspects of array operation, including communication with HP StorageWorks Command
View EVA.
HP Insight software terminology
HP Insight Recovery software (IR) – Enables execution of commands to perform a disaster recovery
solution (also called Insight Dynamics Recovery management).
HP Virtual Connect Enterprise Manager (VCEM) – Management tool that provides management of
multiple HP BladeSystem enclosures equipped with Virtual Connect modules.
Logical Server Profile – A logical server profile is composed of system services and resources whether
these are virtual, physical, shared or unshared – everything that the OS and application stack for a
given workload requires to operate.
Virtual Connect (VC) – HP VC switches plug into the backplane of an HP BladeSystem c7000
enclosure, providing Ethernet and FC switching capabilities with a virtual MAC and WWID.
HP BladeSystem reference architecture with Insight Recovery
extension overview
The single-site HP Reference Architecture for Oracle Grid is the foundation for the solution described
here. This architecture is fully redundant with no single point-of-failure. As previously described, it is a
three-tiered architecture of web hosts, Fusion Middleware and database servers. It is designed to
support an Oracle Real Application Cluster database of two or more nodes with multiple Oracle
middleware and web hosts in a scale-out configuration. Server and LAN connections, including the
dedicated RAC interconnect, are 10Gb Ethernet. Storage is a shared HP StorageWorks Enterprise
Virtual Array which hosts the database, system images and all stored elements including the HP
logical server profiles we explain below. All servers in the environment boot from this shared storage.
Ethernet and Fibre Channel storage connections are made through HP Virtual Connect (VC).
With this environment, server identities (profiles) are abstracted from the physical hardware (see The
logical server section below), making servers completely interchangeable. This means server
replacement can be handled with little to no human intervention. It also means that Oracle servers can
be pre-defined (saved to shared storage) and provisioned in minutes versus days. This model provides
opportunities for dramatic utilization improvement. For example, additional RAC nodes and
application servers could be pre-defined and used to increase capacity at month end for financial
systems or to spin up a test environment on-demand. The single-site reference architecture documents
this model in detail. For further information, see the references at the end of this paper.
Our purpose in this paper is to describe an extension to the single-site reference architecture that
provides for disaster recovery from a complete site outage. The solution starts by assuming mirrored
single-site reference architectures are geographically separated but latencies are reasonable enough
Page 7
7
to allow a synchronous connection with a fibre channel cable between the two EVAs located at the
two sites. One site is the primary production site, the other a recovery environment that can take over
production processing in a matter of minutes. HP Continuous Access EVA software is used to replicate
the shared storage, which includes everything needed to make the recovery. HP Insight Recovery
automates the migration of services from the production to recovery site when an administrator
determines it is necessary.
Overview of Insight software: Insight Dynamics, logical
servers and Insight Recovery
In our disaster recovery architecture, HP Insight Dynamics software provides the resource
management framework. A core concept of Insight Dynamics is the ―logical server.‖ In the context of
our solution, a logical server is a management abstraction that simplifies and optimizes the
provisioning and re-provisioning of servers. Because a logical server is abstracted from the underlying
platform, it makes those underlying resources anonymous to the application/OS stack. A logical
server can be created from a discrete physical server, from within a pool of physical resources, or
from a virtual machine. HP Insight Dynamics software uses the concept of logical servers to deliver a
common framework for planning, deploying and managing both physical and virtual servers
seamlessly.
The logical server
Logical servers bring the freedom and flexibility of virtualization to physical servers. The logical server
is a server profile that is easily created and freely moved across physical and virtual machines. By
detaching the logical identity from the physical resource, you can create or move logical servers on
any suitable virtual or physical machine—on demand.
With a logical server approach, you can even create templates for your frequently used applications
with specific configurations. These templates can be stored and reactivated in minutes, when needed.
A logical server profile describes an abstracted system image (including the system services and
resources), whether these are virtual, physical, shared, or unshared. The system image includes
everything that the OS and application stack require to operate on a particular workload. For
example, a logical server profile would include attributes describing entitlements such as power
allocation, processor and memory requirements, PCI Express devices (local I/O), network connections
(distributed I/O), and storage. The logical server is managed in software. This could be software local
to the platform as firmware integrated into the hardware or software on a centralized management
server (CMS).
For our solution, logical server profiles will be created for all the servers used in the Oracle
environment. Not all applications are well-suited for virtual machines today, particularly those with
high I/O or deterministic latency requirements like an Oracle database, particularly a RAC database.
Therefore, in this disaster recovery solution we will use only logical server profiles of dedicated
physical servers. These profiles enable rip and replace server recovery in minutes with no network or
operating system reconfiguration; this functionality is fully documented in the single-site HP Reference
Architecture for Oracle Grid. For our disaster recovery solution, we will extend these concepts to
remote site fail-over.
Page 8
8
HP Insight Recovery
HP Insight Recovery (IR), also called Insight Dynamics Recovery management, provides disaster
recovery protection for logical servers configured and managed by Insight Dynamics. Logical servers
that are managed by Insight Recovery are referred to as Disaster Recovery Protected (DR Protected)
logical servers. Each DR Protected logical server is configured to run on an HP ProLiant server, either
a c-Class blade equipped with HP VC, or on a virtual machine.
An IR configuration consists of two sites, each running Insight Dynamics software and IR. At any
point in time, one site is configured with the primary site role and the other is configured with the
recovery site role. IR pairs symmetrically configured logical servers across the two sites. The DR
Protected logical servers at the primary site are in an activated state, providing services to the end-
user. The peer logical servers at the recovery site are in a deactivated state. For our Oracle solution,
this means that the database and Oracle middleware are only running at the primary site. The
physical servers standing by at the recovery site cannot be running the same Oracle logical servers
but could be running another workload, such as a test environment for the Oracle application. In the
event of a failure at the primary site, the recovery physical servers would need to be de-provisioned of
their test environment before failover could be initiated. This is a simple process handled routinely by
Insight software.
At the primary site, the boot images of the DR Protected logical servers―including the operating
system, applications code, and data―reside on HP StorageWorks EVA array volumes. The primary
site volumes are replicated to an EVA array at the recovery site. The primary and recovery site arrays
are synchronized with Continuous Access EVA. Each replicated recovery site volume is associated
with a DR Protected peer logical server at the recovery site. The combination of a DR Protected logical
server and its associated storage volume is referred to as a recovery group.
If a disaster occurs at the primary site, the administrator at the recovery site can trigger a site failover
via a push-button provided by IR. This action will fail all of the recovery groups over to the recovery
site. For each recovery group, this involves preparing its storage volume for read-write access and
activating its associated logical server. After all of the recovery groups are failed over, the role at the
recovery site is changed to the primary site.
For complete details on the Insight Recovery product (also called Insight Dynamics Recovery
management), its requirements and design considerations, see the home page at
www.hp.com/go/insightrecovery.
Figure 1 below shows the conceptual diagram of the IR solution, where the green blocks (―A‖, ―B‖,
etc.) represent the OS and database LUNs on the EVA shared storage. The blade servers are part of
the VC domain group which can be failed over.
Page 9
9
Figure 1. Insight Recovery Solution
HP Continuous Access EVA
At the core of the Insight Recovery solution is the storage replication between two sites using the HP
StorageWorks Continuous Access EVA (CA) software, which is an array-based application that uses
advanced replication technologies to replicate data over distances between EVAs. CA utilizes a
simple graphical user interface (GUI) to create, manage and configure remote replication on the
entire EVA family of storage arrays. Furthermore, Continuous Access EVA software provides the
necessary components to achieve an enterprise‘s business continuity objectives in a cost effective and
easily deployable package.
In our disaster recovery architecture, we will use CA EVA in synchronous replication mode so that
every update is posted to both the local and remote arrays simultaneously, ensuring complete
recovery in the event of a site failure. In synchronous write mode, the source array acknowledges I/O
completion only after replicating the data on the destination array. Synchronous replication prioritizes
data currency over response time.
1. A source array controller receives data from a host and stores it in cache.
2. The source array controller replicates the data to the destination array controller.
3. The destination array controller stores the data in cache and acknowledges I/O completion
to the source controller.
4. The source array controller acknowledges I/O completion to the host.
Replication will be maintained for the Oracle database as well as the server boot LUNs stored on
the EVA.
Planning for inter-site latency and bandwidth are critical to ensure service levels to end users. For
complete details on planning and implementing remote replication see the Continuous Access home
page at www.hp.com/go/caeva. The following link provides some of the best practice white papers
and latency measurements for EVA, http://www.hp.com/go/hpcft.
Page 10
10
Figure 2 below shows the SAN connectivity from the VC switches to the EVA storage and a disaster
recovery connection to the recovery site.
Figure 2. EVA CA diagram with HA
SAN
Switches
UID
ESC ENTER
UID
ESC ENTER
HPStorageWorks
hsv210
HPStorageWorks
hsv210
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
hp StorageWorks
Bay 14Bay 1
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
300G
B 15k
54 6 70 1 2 3
1312 14 1598 1110
!
HP StorageWorks SAN Switch 4/16
SPDLNK
54 6 70 1 2 3
1312 14 1598 1110
!
HP StorageWorks SAN Switch 4/16
SPDLNK
PRIMARY SITE
SAN
HP 4Gb VC-FC Module
UID
1 2 3 4
HP 4Gb VC-FC Module
UID
1 2 3 4
ISL
EVA
Management
Alternate EVA
Management
for HA
Dual EVA
Controllers
Connecting to the
Remote Site for
EVA CA
Remote Site
mirrored as in
Primary site
Flex-10 VC Ethernet
switches and
VC Fibre Channel switches
in the backplane of the
c7000 enclosure
Blades connected to
VC Flex-10 and FC
switches thru
backplane
Page 11
11
Architecture summary
Solution attributes
All servers boot from LUNs on the shared EVA storage.
Logical server profiles for each physical server are defined identically at both sites.
Synchronous storage-array based replication over a standard Fibre Channel SAN (site separation
limited to Fibre Channel distances)
Standby site with identical hardware configuration as primary site
HP Insight Recovery solution highlights
The OS and Oracle homes for RAC, WebLogic, SOA and the web hosts reside on the EVA storage as
a separate LUN for each host. The main advantage of this is that any changes to the OS, such as
kernel parameters, tuning, updating OS packages, versions, etc. are automatically copied onto the
secondary site via CA. Oracle software changes in configuration, log files like JMS, TLog, and any
patch, or Java tuning applied to WebLogic or managed servers are automatically copied over to the
secondary site. This helps maintain a pristine production environment ready to come up any time on
the secondary site.
The linking of the above LUNs to the logical server using Insight Recovery makes the solution seamless
as all the parameters of the servers―such as CPU, memory, both SAN and LAN networks―are
maintained in a logical server. Whenever disaster recovery is activated, IR automatically scans for
hardware with the same server attributes and then starts up with the LUN that was already replicated
to the local storage.
A combination of the above two factors ensures high reliability and availability of a disaster recovery
solution ready to be activated when necessary. This provides all the features required for ensuring a
complex setup like Oracle database and middleware to come up smoothly without any issues on the
recovery site. This feature will be further demonstrated during the planned failover section in this POC.
Functionality
Servers at the standby site can be used for other purposes and quickly reconfigured for failover
using logical server profiles and the replicated boot LUNs from the primary site EVA.
Continuous Access EVA replicates the Oracle database and all metadata needed for complete
system recovery.
Continuous Access EVA replicates the JMS, TLogs and any Oracle software, patches or changes in
the middleware.
In the event of a disaster the system administrator initiates re-configuration of the standby site via
Insight Recovery. The standby recovery site servers have the Oracle environment server profiles
applied and boot up in their new roles as the production system.
Page 12
12
SOA disaster recovery logical topology diagram
Figure 3 shows the logical diagram for having a SOA-based disaster recovery configuration; it
represents clustering at the SOA level, WebLogic Server (WLS) and Oracle Web Services Manager
(OWSM). There are multiple web hosts for the OHS servers. The backend is replicated at the
database and middleware. The next section maps the topology into different components of HP
hardware and software to enable a working disaster recovery solution.
Why disaster recovery for middleware SOA
Critical business services may need 24/7 availability and SOA applications, in particular, have
unique availability requirements and SLA compliance has many hardware and software challenges.
Data availability is as important as the service availability and the service available is dependent on
the entire infrastructure. A scalable, highly available infrastructure is required to run your services.
Oracle Fusion Middleware processes XA transactions which are stateful and write to the JMS queue
and database. As seen later in this paper, in a test case with in-flight SOA transactions when the site
failover occurs, the solution protects any number of transactions in flight to be processed.
In addition to the above state persistence requirement, the following requirements are also necessary
to perform a successful disaster recovery between two sites. The Oracle home for the middleware has
to be maintained across the two sites in the same state for the SOA middleware to come up smoothly.
New composite applications installed or web services deployed or OS kernel changes, or patches to
Oracle or OS, are maintained by CA. Overall, the disaster recovery solution is an extension to the
HA capability of the solution. Customers determine the level of HA and disaster recovery solution
necessary for their business.
High Availability (HA)
The disaster recovery topology shows the HA aspect of the solution by having two RAC nodes, two
SOA-managed servers and at least two web hosts. HA includes the clustering, state replication and
routing, failover, load balancing, server migration and server load balancing components. Redundant
network and SAN paths and corresponding switches are also required.
To tailor the solution to your needs, the exact level of HA and disaster recovery components needs to
be determined based on the SLA requirements and other requirements of your business.
Figure 3 below shows the logical diagram representation.
Page 13
13
Figure 3. Logical SOA disaster recovery solution
App Host #1
WLS
SOA
SOA-Infra
App Host #2
SOA Cluster
WLS SOA
SOA-InfraWLS SOA
SOA-Infra
Web Host #1
OHS
Load
Balancer
Admin Host
WLS
Admin Server
DNS Server
FMW
FMW Control
Web Host #2
OHS
RAC DB HP StorageWorks EVA
Disk Array
Data
Data
EVA Storage
Host #1
Host #2
App Host #1
WLS
SOA
SOA-Infra
App Host #2
SOA Cluster
WLS SOA
SOA-InfraWLS SOA
SOA-Infra
Web Host #1
OHS
Load
Balancer
Admin Host
WLS
Admin Server
DNS Server
FMW
FMW Control
Web Host #2
OHS
RAC DB HP StorageWorks EVA
Disk Array
Data
Data
EVA Storage
Host #1
Host #2
Client
Middleware -Tier
Continuous Access
Synchronization
Database -Tier
Continuous Access Sync
Production Site Standby Site
WAN
OHS = Oracle HTTP Server
Page 14
14
POC hardware components
Table 1. POC hardware components
Quantity Description
2 c7000 Enclosures HP BladeSystem c7000 enclosure holds up to 8 HP ProLiant BL685c Blade Servers. One
enclosure per each site for disaster recovery, here called primary and recovery sites.
2 per enclosure HP ProLiant G6 Blade Servers used for database
1 spare for local failover of
database per enclosure HP ProLiant G6 Blade Server used only if any of the above fails (spare DB blade)
4 per enclosure HP ProLiant G5 Blade Servers used for middleware WebLogic and web hosts
1 spare for local failover of
middleware per enclosure HP ProLiant G5 Blade Server used for middleware in case of failover
1 per blade server
(8 per site in this setup)
QLogic Fibre Channel mezzanine card dual port configuration QMH2462 4Gb for HP
c-Class BladeSystem. If you use Emulex cards on one site, use the same type of cards on the
other site.
2 per enclosure VC Flex-10 Ethernet module
2 per enclosure HP 4Gb VC FC module
2 per site for HA HP StorageWorks SAN Switch 4/16
2
HP StorageWorks Enterprise Virtual Array, EVA6400 (one per site)
(Includes 4U Controller assembly with 2 HSV400 controllers and 1 DL380 EVA
Management station)
2 per site
ProCurve switches
Flex-10 supported Fibre and Ethernet supported ProCurve switches
HP ProCurve 2910al-24G Switch (J9145A)
GBICs Flex10 -supported GBICs
as needed Shortwave fibre cables and Cat5 network cables
2 DL380 G5 or above with at least 8GB memory and 146GB hard disk. One for each site;
required for Insight software
2 DNS servers required for primary and recovery sites. (Assumed to be in place under the
existing infrastructure.)
2 F5 BIG-IP Load Balancer (Local Traffic Manager)
1 DL 380 G5 as a Client Access Machine
2 F5 Global traffic manager. This facilitates the client connections transfer from primary to
recovery site smoothly. This is optional, can be done alternatively using DNS.
These are only the representative components we chose to use in this POC. Most currently shipping
ProLiant G5 or later servers and EVA models would work as well. Also, much larger configurations
with up to 50 servers and terabytes of storage are possible. For details, limitations, and requirements,
see the Continuous Access EVA and Insight Recovery support specifications.
Page 15
15
POC software components
Table 2. POC software components
Software Description
Red Hat Enterprise
Linux Version 5.3 on HP ProLiant blade servers used for database and middleware
Oracle Database Version 11gR2 on HP ProLiant G6 blades
Oracle
Middleware
Version 11gR2 on HP ProLiant G5 blades, including WebLogic admin server, managed server and SOA
configuration as described in Oracle Enterprise Deployment Guide
Insight Software
Version 6.0 on DL380 G5 server. This is a comprehensive management software which includes the
following, accessible under one web interface:
HP Version Control 6.0
HP Insight Control 6.0, which includes:
HP Insight Control licensing and reports 6.0 (new)
HP Insight Control performance management 6.0 (updated)
HP Insight Control power management 6.0 (updated)
HP Insight Control server deployment 6.0 and 6.0.2 patch (updated)
HP Insight Control server migration 6.0 (updated; new in Insight Control)
HP Insight Control virtual machine management 6.0 (updated)
HP Insight managed system setup wizard 6.0.1
HP Insight Software Advisor 6.0
HP Virtual Connect Enterprise Manager 6.0
HP Insight Dynamics 6.0.1, which includes:
Capacity planning
Configuration management
Recovery management (Insight Recovery)
Infrastructure orchestration
HP Insight Capacity Advisor Consolidation software 6.0
EVA Command
View Version 9.2
EVA Controller
firmware XCS v9.5 or later
EVA Continuous
Access CA license for each EVA6400; required for replication
Page 16
16
POC version details of BladeSystem infrastructure
The items in Table 3 typically come along with the c-Class enclosure, or you can update with the latest
firmware available on the HP website. Table 3 lists the versions used in this setup.
Table 3. POC version details
Software Description
Active Onboard Administrator Version 2.60
HP Integrated Lights-Out 2 (iLO 2)
for each HP ProLiant G6 blade Version 1.78
iLO 2 for each HP ProLiant G5 blade Version 1.30
HP VC Flex-10 Ethernet Module Version 2.31
HP 4Gb VC-FC Module Version 1.40
Infrastructure diagram
Table 1 listed the different hardware components used for the POC. Figure 4 below maps these
components into an actual infrastructure diagram and represents the hardware diagram of the POC
setup. Note that in the interest of simplifying our proof of concept, we eliminated certain redundant
components which would be necessary to ensure high availability for both primary and recovery sites.
Two load balancers would be required for high availability for each site; similarly, two ProCurve and
two SAN switches would be required for each site, as indicated in the POC hardware components. A
general guideline to the hardware infrastructure is provided in this diagram.
Evaluate your business requirements to determine an appropriate combination of HA and disaster
recovery for your enterprise.
Page 17
17
Figure 4. Hardware representation of disaster recovery solution
Client running on DL380
BIG-IP Model 1500 BIG-IP Model 1500
HP ProCurve Switches
2 for HA
Insight Software with
Insight Recovery
ProLiant BL685c
blades for
WebLogic and
SOA Server and
web hosts
Virtual Connect
Flex-10 Ethernet
Technology for
Seamless Failover,
Addition or Rip and
Replace
VC Fibre Channel
Boot From SAN
EVA Continuous Access for
Sync of Middleware and
database along with OS
EVA6400 on each site
Insight Software with
Insight Recovery
HP ProCurve Switches Flex-10
Ethernet Switch
Power
Fault
Console
ProCurve
Locator
Mdl RPS Status of the Back
Test
Fan
Tmp
Status
Reset Clear
Usr
*
FDx
Spd
Act
LEDMode
Auxiliary Port
Networking by HPProCurve
Dual-Personality Ports: 10/100/1000-T (T) or SFP (S)
Use o
nly
one
(T o
r S) fo
r each P
ort2422Link Mode
21Link Mode 2319
20
17
18
15
16
Link Mode
Link Mode
13
14
7
8
5
6
3
4
Link Mode
Link Mode
1
2
9
10
11
12 22T 24T
23T21T
Ports ( 1 - 24T ) – Ports are Auto-MDIX* Spd Mode:
2 flash = 100 Mbps
off = 10 Mbps
on = 1 Gbps
3 flash = 10 Gbps
Switch
HP ProCurve
J9145A
2910al-24G
Power
Fault
Console
ProCurve
Locator
Mdl RPS Status of the Back
Test
Fan
Tmp
Status
Reset Clear
Usr
*
FDx
Spd
Act
LEDMode
Auxiliary Port
Networking by HPProCurve
Dual-Personality Ports: 10/100/1000-T (T) or SFP (S)
Use o
nly
one
(T o
r S) fo
r each P
ort2422Link Mode
21Link Mode 2319
20
17
18
15
16
Link Mode
Link Mode
13
14
7
8
5
6
3
4
Link Mode
Link Mode
1
2
9
10
11
12 22T 24T
23T21T
Ports ( 1 - 24T ) – Ports are Auto-MDIX* Spd Mode:
2 flash = 100 Mbps
off = 10 Mbps
on = 1 Gbps
3 flash = 10 Gbps
Switch
HP ProCurve
J9145A
2910al-24G
HP StorageWorks 4/16 SAN
Switches for each EVA
2 for HA on each site
ProLiant
BL685c G6
for
Database
Local DNS server Local DNS server
Page 18
18
Oracle database and middleware
Oracle Fusion Middleware
Oracle Fusion Middleware 11g is a comprehensive family of products that are seamlessly integrated
to help create, run, and manage agile and intelligent business. Fusion Middleware SOA Suite
provides a complete set of service infrastructure components for designing, deploying, and managing
composite applications. The suite enables services to be created, managed, and orchestrated into
composite applications and business processes. The components of the suite benefit from common
capabilities including a single deployment and management model and tooling, end-to-end security,
and unified metadata management. Oracle SOA Suite is unique in that it provides the following set of
integrated capabilities: messaging, service discovery, orchestration, activity monitoring, business
rules, events framework, web services management and security.
A few of the products from Oracle Fusion Middleware that are part of the POC configuration for
disaster recovery are:
Oracle WebLogic Server
Oracle JRockit JVM
Oracle SOA Suite
Oracle HTTP Server (OHS)
A few of the key components of the Oracle SOA Suite 11g are:
Oracle Service Bus
Oracle Complex Event Processing
Oracle Business Rules
Oracle Adapters
Oracle Business Activity Monitoring
Oracle B2B
Oracle BPEL Process Manager
Oracle Service Registry
Oracle User Messaging Service
Oracle Human Workflow
Oracle Mediator
Oracle RAC database
Oracle Real Application Cluster (RAC) supports the transparent deployment of a single database
across a cluster of servers, providing fault tolerance from hardware failures or planned outages.
RAC provides a high level of availability, scalability, and low-cost computing. RAC provides very
high availability for applications by removing the single point of failure with a single server. If a
node in the cluster fails, the database continues running on the remaining nodes. Individual nodes
can be shut down for maintenance while application users continue to work. Fast application
notification enables end-to-end lights-out recovery of applications and load balancing when a cluster
configuration changes.
Oracle RAC provides flexibility for scaling applications. To lower costs, clusters can be built from
standardized processing, storage, and network components. When additional processing power is
needed, simply add another server without taking users offline, providing horizontal scalability.
Applications never have to modify connections as you add or remove nodes in a cluster. Oracle
Page 19
19
RAC 11gR2 introduces the single client access name (SCAN) to allow clients to connect to the RAC
database with a single address which includes failover and load balancing.
POC site configuration
As per the hardware components indicated in Table 1, each site has a set of blades for database,
middleware and web hosts in a c7000 enclosure. There are a number of steps required to arrive at a
fully configured site. The following list briefly explains each step. For detailed installation steps, refer
to other HP documents as listed in the Appendix A.
Step one. Create boot LUNs on storage.
On the EVA6400, create the boot LUNs required for each of the blade servers, namely the database
servers, middleware servers, and web hosts. In this setup, database servers were created with 200GB
each and the middleware servers and web hosts were created with 100GB each. The 11gR2 RAC
database is on the partition called ―RACDB‖; this includes the Cluster Ready Services (CRS),
Automatic Storage Management (ASM) and database partition as per 11gR2 requirements. Figure 5
below shows the LUN partitions created on the EVA6400 for this setup.
Figure 5. LUN partitions on EVA6400
Page 20
20
Step 2. Create Virtual Connect profiles for each of the blade servers that need to be deployed. This
assigns a virtual MAC ID for each of the networks and virtual WWID for each of the SAN networks.
Figure 6 below shows the configured Virtual Connect profiles for site 1 (primary site) and the blade
servers they are assigned to. The bay number indicates which profile is associated with which
physical blade.
The blade servers are configured to boot from SAN, that is, local disks are not used to install the OS.
The OS is installed on LUNs on the EVA shared storage.
Figure 6. Site 1 configured VC profiles
Step three. Present LUNs from shared storage to the corresponding blade server VC profile. Install the
operating system (OS). Red Hat Linux is installed on each of the servers after the LUNs are presented.
Page 21
21
Step four. Install Oracle Database 11gR2. Configure a two-RAC-node database. At the end of the
installation, verify that both nodes are up and running. Figure 7 below shows two RAC nodes up
and running.
Figure 7. Verification that RAC nodes are up
Step five. Install WebLogic middleware and SOA using the Oracle Enterprise Deployment Guide to
install the SOA configuration, which will result in having the WebLogic admin server and two
managed servers with SOA up and running. Figure 8 below shows the WebLogic admin console with
the status of all the components up and running.
Figure 8. Fusion Middleware Components
Page 22
22
Step six. Verify through the Oracle Enterprise Manager that all the components are deployed. In this
configuration, SOA domain name ― irdomain ― is created with SOA-managed servers and deployed.
Figure 9. Verification that all components are deployed
Step seven. Install web hosts.
Figure 10. Web host installation
Page 23
23
Step eight. Configure F5 load balancer (Local Traffic Manager) with virtual IP and forward it to
appropriate web hosts. The F5 load balancer virtual IP will be used by clients to connect.
Figure 11. Appropriate web hosts receive F5 load balancer
Page 24
24
Planning DR groups
On the secondary or recovery site storage, LUNs are created by CA once the LUNs in the primary site
are configured to be part of DR groups. CA ensures similar LUNs are created on the recovery site and
available. Each of the RAC nodes can be put in one DR group and similarly one group each for the
SOA and web hosts as shown in Figure 12. If there is a requirement in your application that needs
I/O write order to be maintained between database and middleware nodes then they can be in the
same DR group. SOA itself does not need this as a requirement and it can be in separate DR groups.
The RAC database can be on a separate DR group or added to the RAC DR group. The advantage
of configuring the whole LUN with OS and Oracle Home in the DR group is that all the kernel
parameters and tuning done on the OS and Oracle Home―changes like new patches, updates or
new applications deployed―are replicated to the other side. The database is also replicated to the
secondary site.
Figure 12. Database replication to secondary site
Installation and configuration of Insight software and
Insight Recovery
A brief description is provided here; refer to Appendix A for document links to detailed installation
and configuration guides available on this subject. Install Insight 6.0 software, which has several
components, on a separate server on each of the sites. On the primary site, configure each of
the profiles as logical servers. Configure Insight Recovery on both sites; this associates a logical
server to a boot LUN. On site 2 (recovery site), create a logical server profile and leave it in a
deactivated state.
Page 25
25
Refer to the Insight Recovery user and configuration guide for details of installing Insight software
and configuration of Insight Recovery. When a disaster recovery or failover happens from primary
to recovery site, the Insight Recovery software will activate these logical servers and associate the
corresponding LUNs. Since this is a cold standby type of scenario, the physical servers are required
only at the time of recovery. Figure 14 shows the logical servers configured for the primary site and
Figure 15 shows Insight Recovery configured on site 2, signifying that it is a recovery site.
Figure 14. Logical servers configured for the primary site
Figure 15. Insight Recovery configured on site 2
Page 26
26
Test case
Oracle‘s Fusion order demo is a middleware SOA composite application running on the SOA
managed servers. There are also a number of services deployed to service the application, such as
credit service. The database has the appropriate tables uploaded for this application. Some of the
test cases include:
Oracle database
Ping Oracle database
Connectivity test
Oracle WebLogic
Ping WLS admin console
Validate WebLogic Managed Server startup and log
Oracle SOA Suite components
Connectivity test for Fusion Middleware control console
Connectivity test to worklist App
Connectivity test to SOA-Infra
Validate deployed applications
Ping deployed web service end-points
Log in to OFMW control and navigate to test SOA order booking composite application
Submit two new orders
Validate application transactions
Log in to Oracle Worklist application
Validate new orders and orders submitted at primary site
Approve one new order and an order submitted on primary site
Validate that orders continue to be processed
Log in to OFMW control and validate the status of the orders
Validate application transactions in between failover
Submit a large order which requires approval
Do a failover before the order is approved
The application test environment
The Oracle Order Booking demo was used to validate the middleware disaster recovery
configuration. It provides a reliable and consistent way to test and validate a number of Oracle
Fusion Middleware components and their configurations. It also provides consolidated test results and
applicable best practices for disaster recovery. Tests are initiated at the primary site followed by
manual fault injection, standby site failover, and then test execution at the standby site to assure the
transaction logs generated by JMS services can be consistently restored after failure by initiating an
XA transaction that writes messages to a JMS queue and database.
Page 27
27
The test application also provides the SOA Order Booking Composite application to drive the test
scenario. This application is the main application for processing orders from Global Company. This
composite application demonstrates how services, both internal to an enterprise, and external at other
sites, can be integrated using the SOA architecture to create one cohesive ordering system. Figure 16
below shows different engines in Oracle Fusion Middleware and the tests are designed to exercise
these components.
Figure 16. OFMW components
The Order Booking Composite application utilizes the following Oracle SOA suite components:
Oracle Mediator
Oracle Business Process Execution Language (BPEL) Process Manager
Oracle Human Workflow (using a human task)
Oracle Business Rules
Oracle Messaging Service
Page 28
28
The Order Booking Composite application uses BPEL to orchestrate all the existing services in the
enterprise for order fulfillment with the appropriate warehouse based on the business rules in the
process. The diagram in Figure 17 illustrates the workflow.
Figure 17. Order booking composite
Page 29
29
The Order Booking Composite application uses Oracle SOA Suite components such as Mediator,
BPEL, Business Rules, Human Workflow and Adapters. It invokes web services in a defined flow
sequence. The web services are independent of each other and are generated in different ways.
The screenshot in Figure 18 from Oracle JDeveloper illustrates the component layout.
Figure 18. Fusion Order Booking component layout
The following table further elaborates these elements.
Table 4. Technology and techniques used
Project Technology and techniques used
CreditService ―Top-down‖ implementation of web services: starting with a WSDL file, use JDeveloper to generate
Java classes from the WSDL file
RapidService ―Bottom-up‖ implementation of web services: starting with Java classes, use JDeveloper to generate a
WSDL file, and JSR-181 Web Services Metadata annotations in the Java files
SelectManufacturer Simple asynchronous BPEL process with receive and invoke activities.
FulfillmentOrder
Routing services that use filters to check input data. The filters then direct the requests to the
appropriate targets including a JMS adapter.
Transformation rules to transform data appropriately for writing to databases and files. Database
adapters and file adapters perform the writes.
Page 30
30
FusionOrderBooking
BPEL process to orchestrate a flow sequence that includes:
Other BPEL flows (the SelectManufacturer BPEL project)
Mediator project (the FulfillmentOrder project)
Oracle Business Rules with BPEL
Decision Service
Flow activity to send requests to RapidService and SelectManufacturer
Human task to set up a step that requires manual approval
Network and DNS configuration for disaster recovery
Oracle requires that the hostnames on both sites be the same and resolve to the same IP address in
order that when failover occurs different Oracle components come up smoothly. In addition the client
has to transfer its requests to the recovery site when the primary site is down.
To have the same hostnames and IP addresses at both the primary and recovery sites, the servers can
be configured on a private network on each site which is non routable. This will ensure that the
hostnames and IP addresses on both sites for Oracle are the same. Alternatively a local DNS server
can also provide the same names across both sites.
The global DNS server will resolve the name clients connect to. The load balancer creates a virtual IP
which is available for the clients to connect. For example, the load balancer virtual IP could be
mapped to http://<MySoaCompany.com>>: <port number>/console. This can be a single load
balancer or a host of several load balancers. The name resolves into the virtual IP address exported
by the load balancer. The load balancer further forwards the messages to a pool of web hosts
running OHS which are on the private network.
This web configuration is duplicated on the other site, so that a private network for the servers is
maintained. This will ensure that hostnames and IP addresses are the same across the two sites.
The DNS resolves the names of the primary and recovery site load balancer virtual IPs as a list of the
IP addresses to forward to. When there is a failure at the primary site and the virtual IP of the primary
site is down, any new requests from the client are automatically forwarded to the secondary site.
Another way to maintain the same IP address structure at both sites is to have the client configured
with primary and alternate DNS servers as primary and alternate sites. This also will route client
requests to the alternate site when the primary site is down. F5 also provides a software solution
using Global Transaction Manager (GTM), which allows client connections to be transferred to the
new site seamlessly.
Page 31
31
Figure 19. Network Configuration
2 Flex-10 VC Ethernet switches located in the backplane of the c7000
enclosure. All blades are connected to the backplane switches.
Internet
Client
Primary DNS server
Power
Fault
Console
ProCurve
Locator
Mdl RPS Status of the Back
Test
Fan
Tmp
Status
Reset Clear
Usr
*
FDx
Spd
Act
LEDMode
Auxiliary Port
Networking by HPProCurve
Dual-Personality Ports: 10/100/1000-T (T) or SFP (S)
Use o
nly
one
(T o
r S) fo
r each P
ort2422Link Mode
21Link Mode 2319
20
17
18
15
16
Link Mode
Link Mode
13
14
7
8
5
6
3
4
Link Mode
Link Mode
1
2
9
10
11
12 22T 24T
23T21T
Ports ( 1 - 24T ) – Ports are Auto-MDIX* Spd Mode:
2 flash = 100 Mbps
off = 10 Mbps
on = 1 Gbps
3 flash = 10 Gbps
Switch
HP ProCurve
J9145A
2910al-24G
Power
Fault
Console
ProCurve
Locator
Mdl RPS Status of the Back
Test
Fan
Tmp
Status
Reset Clear
Usr
*
FDx
Spd
Act
LEDMode
Auxiliary Port
Networking by HPProCurve
Dual-Personality Ports: 10/100/1000-T (T) or SFP (S)
Use o
nly
one
(T o
r S) fo
r each P
ort2422Link Mode
21Link Mode 2319
20
17
18
15
16
Link Mode
Link Mode
13
14
7
8
5
6
3
4
Link Mode
Link Mode
1
2
9
10
11
12 22T 24T
23T21T
Ports ( 1 - 24T ) – Ports are Auto-MDIX* Spd Mode:
2 flash = 100 Mbps
off = 10 Mbps
on = 1 Gbps
3 flash = 10 Gbps
Switch
HP ProCurve
J9145A
2910al-24G
Example.soaCompany.com
HP Flex-10 10Gb VC-Ethernet Module HP Flex-10 10Gb VC-Ethernet Module
Insight
Software
HP ProCurve Switches
Connecting to one or
multiple load balancers
BIG-IP Model 1500
BIG-IP Model 1500
Standby
site
Similar
Config-
uration
Primary Site
Switchover (planned failover)
Switchover (planned failover) is done if there is an imminent threat to the primary site or for periodic
validation or planned maintenance. Once a switchover occurs, the current primary or production site
becomes the recovery site and the current recovery site becomes the new primary or production site.
The following steps need to be performed in sequence to do a planned failover. In the next section,
an example of a planned failover is detailed step-by-step with an XA transaction between failovers.
1. Shut down all the Oracle components like web hosts, Oracle Fusion Middleware and Oracle
Database either manually or using the corresponding management software.
2. Shut down the operating systems.
3. At the current primary or production site, using the Insight Recovery software, select ‗convert
primary to recovery site‘. The Insight software will deactivate the logical servers on this site.
4. At the current standby or the recovery site ensure that sufficient server resources are available to
bring up the application and database.
5. At the current standby site using the Insight Recovery software select ―change current site to
primary site.‖ This action will convert the EVA connected to this (new primary site) to be the source
and will activate the logical servers.
Page 32
32
6. The OS on the secondary site will come up.
7. Verify that the hostnames and IP addresses are the same as in the original primary site.
8. Re-enable the IP addresses to the appropriate Ethernet ports, for example, if the racnode1 server
on the first site has eth0 address of 100.100.100.1, then enable this on the eth port 0 in the
racnode1 server on the newly migrated primary site. Repeat this on all the servers to make sure
that the IP addresses match with the original site. HP provides a Portable Images Network Tool
called PINT which maintains the IP addresses after a logical server move or migration. For more
details on PINT tool configuration refer to the documentation listed in Appendix A.
9. Restart the Oracle database servers, Oracle Fusion Middleware components like WebLogic
admin, SOA-managed servers and OHS web hosts.
10. Enable the virtual IP of the load balancer on the new primary site.
11. Verify with Oracle Enterprise Manager that all components of the Oracle Fusion Middleware are
up and running.
12. The DNS server on the new primary site now will direct all the requests to the newly enabled
virtual IP of the load balancer.
13. Use a browser from a client to test out the new site.
Switch back procedures
Repeat all the steps above to perform a switch back to the original production or primary site.
Best practices
For Oracle Fusion Middleware disaster recovery, Enhanced Asynchronous and Synchronous
replication write modes are recommended. The replication write mode can be set on specific DR
group(s) depending upon frequency of data change. The Asynchronous mode replication is not
recommended for Oracle Fusion Middleware. The TLogs and JMS log files can be replicated using
Enhanced Asynchronous mode so that the performance is not impacted.
Synchronous replication prevents any loss of data, however it also requires each write I/O to be
completed on the destination array before it is considered completed for the local array. Therefore
in an environment where there are a lot of write I/Os, synchronous replication is a potential drag
on performance.
Enhanced asynchronous replication is nearly as robust as synchronous replication. In enhanced
asynchronous replication, write I/Os do not have to be completed on the destination array before
they are marked as completed locally. At the same time, there is protection against data loss,
because each I/O is written to the local array and a DR group write history log before it is
considered to be complete. The write history log is written in the same order that the write I/Os are
written to the local array. As the I/Os are propagated to the destination array, they are removed
from the write history log, so the write history log is a sequential record of all write I/Os written to
the local array that have not yet been acknowledged to be completed on the destination array.
In the event of a failure while enhanced asynchronous write mode is being used, all pending write
I/Os are preserved in the write history log. In this scenario, one option is to simply wait until the
source array can be brought back up. If the failure is only temporary and can be corrected in a
short period of time, this is probably the best option, because it ensures that no data will be lost.
In the case where the failure is not temporary or the production environment needs to be brought
back online quickly, the customer will have to fail over the production site to the standby site. In
enhanced asynchronous write mode, this means that all pending write I/Os in the write history log
will be lost. The number of writes lost can be minimized if the writes are being processed quickly,
and the therefore the number of pending writes is low. The rate of write processing should be
estimated by customers when they are setting their RPO. The RPO is dependent on the bandwidth of
Page 33
33
the inter-site link, which is in turn dependent on the distance between the arrays, the type of
interconnect, and other factors. Careful analysis of the application‘s write profile and the replication
link speed can determine what the worst case RPO will be for the solution. For complete details on
RPOs, bandwidth, and inter-site links, see the HP StorageWorks Continuous Access EVA
implementation guide (see the For more information section for link).
While it is possible that a failover using enhanced asynchronous write mode could result in zero
data loss if the write log is empty, enhanced asynchronous replication can never be guaranteed to
achieve that objective. Synchronous replication is the only way to guarantee zero data loss. An
Oracle Fusion Middleware environment does not generally require synchronous replication, the
JMS and TLog files can be put in enhance asynchronous replication mode so that the performance
is not affected.
In the event of failure in the middle of a large block transfer, the CA EVA synchronous replication
guarantees the I/O write order during replication to the other site. So this becomes a typical case
of how Oracle handles a local site failure due to power or any other reason when a database goes
down. The synchronous replication also ensures that any message that is written to the disk is
written on both sites. If the replication is enhanced asynchronous the replication is again ensured to
write I/O order protected due to the history log.
In addition to the CA EVA replication, it is recommended that the user maintain a daily backup of
the sites using HP StorageWorks Business Copy on each site. Additional features can be used, such
as snapclone, which takes a point-in-time physical copy of the virtual disk, and the Vsnap feature,
which takes a snapshot of the virtual disk. The EVA CA is a highly reliable and highly available
configuration. The user needs to create a procedure to complement the EVA CA replication with
other features of backup based on the risk profile of the business needs.
Choosing the size of your write history log
As noted, enhanced asynchronous replication is the preferred write mode for most parts of the Oracle
Fusion Middleware environment. For enhanced asynchronous write mode to work properly, the write
history log must be large enough to hold the entire write I/Os for a system that is under peak load.
This is extremely important, since a full write log results in a process called normalization, which will
force a synchronization of the source and destination arrays. Under peak load, a forced
normalization would have a very negative impact on performance. Another reason to set the size of
the write history log correctly from the point when the DR group is created is that changing the size
requires you to switch to synchronous, drain the write log, and then switch back to enhanced
asynchronous mode.
High Availability for persistent stores
The WebLogic application servers are usually clustered for high-availability. For high-availability of
the SOA Suite within a site, a persistent file-based store is used for the Java Message Services (JMS)
and Transaction Logs (TLogs). These files can use enhanced asynchronous mode, to take care of
latency issues.
Meeting SLA requirements and recovery time
The SLA requirements can consider a host of factors, such as:
Whether application availability needs to be 99.95% for highly critical applications to 99.5% for
lower criticality
Inter-site link bandwidth, and type of communication used whether dark fibre or Ethernet etc.
Cost of the solution
The RPO and RTO define the time taken to recovery and the point until which data is recovered.
Failover distances at metro level – typically less than 150 miles – can be on synchronous replication.
Page 34
34
There is minimal data loss of less than 5 minutes, and recovery time within a few hours (2-5) to a few
days depending on the criticality of the application and SLA requirements.
For long distance recovery – greater than 150 miles – required to protect from major area wide
disasters, asynchronous replication is used. The recovery time can be anywhere between 2 hours to
20 days depending on SLA requirements, and data loss can be as low as 10 minutes to up to 2 days
depending on the criticality and SLA.
Example of switchover using Insight Recovery during SOA
transaction (XA transaction)
To verify a SOA transaction across two sites, the following test is run. An order is entered from the
primary site and when the order is awaiting approval, site 1 goes down. The setup is now recovered
at the recovery site, where the order is approved and the transaction is completed.
Step one. Verify that the Oracle RAC database, Oracle WebLogic Server and web hosts are up and
running and the Fusion order entry application is deployed. Initiate a small order and verify that it
completes successfully. Oracle Enterprise Manager shows the instance created for each of the orders
(see Figure 20).
Figure 20. Fusion Order booking application
Step two. Initiate an order that requires Human Workflow, that is, it requires approval before
the order is approved. Figure 21 shows initiating the order with appropriate values as stored
in the database.
Page 35
35
Figure 21. Initiating the order
Step three. The fedexshipment table in the database is empty. An entry will be made once the order
is approved.
Figure 22. Entry created upon approval
Page 36
36
Step four. The new order is shown in the Oracle Enterprise Manager with the new instance ID.
Figure 23. New order and instance ID
Step five. The order goes into the Human Workflow Oracle BPM worklist application for approval.
Here a human will have to approve the order, before it can be processed further.
Figure 24. Human Workflow worklist
Page 37
37
Step six. At this time, the primary site is shut down without approval of the order, so that the recovery
site can process the remaining portion of the order. The web hosts and load balancers are shut
down first and then the SOA-managed servers, WebLogic admin server and finally the database is
shut down.
Figure 25. Primary site shutdown
Step seven. Using the Insight Recovery software, convert the primary site to the recovery site. The
Insight software will deactivate the logical servers on the primary site as shown in Figure 26 below.
This job will complete within 10 minutes of starting the job.
Figure 26. Deactivation of logical server
Page 38
38
Figure 27. Deactivation completed
Step eight. Change the new site where the applications are failed over to the primary site using the
Insight Recovery software. This will trigger the job to do the failover, after accepting the action. The
job will activate the new site logical servers, and all the LUNs associated with logical servers are
made the new source in the shared storage. The failover of the LUNs from the primary to recovery site
is done automatically by the Insight software. After the activation of logical servers on the recovery
site the servers are started up.
Figure 28. Activation of new site logical server
Page 39
39
Figure 29. Confirm the failover to initiate
Page 40
40
Step nine. Insight Recovery starts up the new primary site servers. Before applications are started up, it
is important to configure the IP addresses to be the same as in the primary site. This configuration can
be done using the PINT tool provided by HP, or also can be scripted or done manually. Verify that the
hostname and IP address is the same as that in the primary site. Verify that all the shared disks are
visible, such as Oracle RAC database disk. Now the systems on the new primary site are ready for
the applications to be started up.
Figure 30. Servers started up on Recovery site
Step ten. Start up the applications, database, WebLogic admin server, SOA-managed server, and
web hosts. The hostname and IP addresses are the same on both sites. Verify that all the composites
and services that are deployed are up and running. JMS and TLog files are intact and will continue
with transactions where they left off. The CA feature of EVA ensures that the two sites are up-to-date
and in sync.
Step eleven. Startup the F5 load balancer on the new primary site and verify that the virtual IP address
is active and is load balancing between the two web hosts. This indicates that clients can now
connect to this new site.
Page 41
41
Step twelve. Validate some basic steps such as the web services, and all the OFMW are started up
correctly and applications are deployed. Note that the process of bringing up the servers, OS, Oracle
RAC database and OFMW applications may take up to 30 minutes. Validate that all applications are
started up in the middleware Enterprise Manager before proceeding to open the access to the clients.
Figure 31. Verify that applications are deployed
Page 42
42
Step thirteen. The clients can reconnect to the new virtual IP address at the secondary site.
Step fourteen. Approve the previously pending order to complete the transaction.
Figure 32. Transaction completion
Step fifteen. The instance which is processing the order completes and the trace flow of the
order is shown in Figure 33 below. An entry is made to the table indicating that the order has
been processed.
Figure 33. Trace flow of order
Page 43
43
Figure 34. Confirmation of order processing
Unplanned failover
Power down on primary site
A power failure or other abrupt or catastrophic interruption of service at the primary site will result
in an unplanned failover. In this case on the recovery site, using the Insight Recovery software, the
admin or user will have to manually change the local site to the primary site. To start up on the
recovery site, the following steps need be done:
Step 1: Ensure that site 1 has failed so that there is no possibility of split brain configuration.
Step 2: Ensure that there are enough server resources to start up the logical servers.
Step 3: Verify that the local Command View server at the recovery site is up and running.
Step 4: Ensure that the recovery site is not in maintenance mode, that is, the maintenance mode is
set to false.
Step 5: Select ―Change local to Primary site‖ and confirm the change.
This changes the storage connected to the recovery site (current site) to be active and the logical
servers are activated. The servers are booted up and ready for starting up Oracle software.
Page 44
44
Figure 35. Unplanned Failover
Since the synchronous updates are done between the two sites, all transactions are protected, and all
the JMS and TLog entries are saved. Once the previous primary site infrastructure is restored, the
logical servers on that site will have to be deactivated and the EVA CA will have to show the current
site is a destination site. Once the CA is re-established, it will ensure that the storage between the two
sites are synced up again.
Network is down on the Primary site and the inter-site link between two sites
is down
When the primary site is alive but unreachable due to a network service interruption, an administrator
has to determine that a failover to the other site is required, and an unplanned failover, as described
above, can be executed. Ensure that once failover is done to the recovery site, the primary site is
deactivated by switching it to a recovery site. The EVA storage on the primary site may still show its
LUNs as source, proper procedure needs to be followed to make sure that only one site is source
before turning the connectivity back on between the two sites.
Comparison with Oracle Data Guard solution for
database replication
Oracle Data Guard provides a disaster recovery solution for databases. The database can be
replicated using an Oracle Data Guard solution; however, EVA CA still needs to be used for the
replication of OS, ORACLE_HOME middleware and web hosts. Oracle Data Guard requires the
standby site to be up and in passive mode since the log files have to be applied on the secondary
site. A performance comparison is beyond the scope of this document.
Page 45
45
Summary
This paper highlights some of the key HP technologies to build a disaster recovery configuration using
HP blade infrastructure, Virtual Connect technology, Continuous Access EVA and HP Insight Dynamics
software. These components together provide a highly reliable solution. It can be made HA-compliant
by adding sufficient duplication to the components on each site where required. The Insight Recovery
component of the Insight Dynamics software provides an easy methodology to automate the disaster
recovery solution combining the logical server concept and the CA technology of the EVA.
Oracle Database and Oracle Fusion Middleware (OFMW) is a complex set of software and
configuration and the POC demonstrates the ability to configure and build a complex solution using
HP technology.
Page 46
46
Appendix A: Documents
EVA documentation
HP StorageWorks SAN Design Reference Guide: Provides information on distance configuration,
required parts and topology etc.
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentType=SupportM
anual&locale=en_US&docIndexId=179911&taskId=101&prodTypeId=12169&prodSeriesId=40673
4
HP StorageWorks SAN manuals,
http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c00403562/c00403562.pdf
HP Insight software and Insight Recovery documentation
HP Insight Dynamics manuals,
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=
101&prodClassId=10008&contentType=SupportManual&docIndexId=64255&prodTypeId=18964&
prodSeriesId=4146132
HP Reference Architectures for Oracle Grid on the HP BladeSystem,
http://h71028.www7.hp.com/enterprise/cache/494866-0-0-0-121.html
HP Insight Software 6.0 Installation and Upgrade Release Notes,
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c02054357/c02054357.pdf
HP Insight Software 6.0 Installation and Configuration Guide,
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c02048569/c02048569.pdf
HP Insight Recovery 6.0 User Guide,
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c02044078/c02044078.pdf
HP Insight Recovery Online Help
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c02017145/c02017145.pdf
HP Portable Images Network Tool (PINT) details are in the last section of the Insight Control server
migration User Guide,
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02048550/c02048550.pdf
Oracle documentation
Oracle‘s Middleware Disaster Recovery Guide and Disaster Recovery Terminology,
http://download.oracle.com/docs/cd/E10291_01/core.1013/e12297/intro.htm#CHDHDAJF
Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite 11g Release 1,
http://download.oracle.com/docs/cd/E12839_01/core.1111/e12036/toc.htm
Oracle Fusion Middleware Documentation Library,
http://download.oracle.com/docs/cd/E12839_01/index.htm
Page 47
For more information
HP StorageWorks 6400/8400 Enterprise Virtual Array manuals,
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=
101&contentType=SupportManual&docIndexId=64179&prodSeriesId=3900918&prodTypeId=1216
9
Configure DR Solution using HP EVA Continuous Access,
http://docs.hp.com/en/B7660-90019/ch04.html
Best Practices for HP Continuous Access and HP Cluster Extension with the HP EVA4400 in an
Exchange 2007 environment,
http://h20195.www2.hp.com/V2/getdocument.aspx?docname=4AA2-3244ENW.pdf
To help us improve our documents, please provide feedback at
http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html.
© Copyright 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Java is a US trademark of Sun Microsystems, Inc.
4AA2-1244ENW, Created June 2010; Updated August 2010, Rev. 1