4AA2-1244ENW

Reduce risk with disaster recovery for

Oracle Fusion Middleware 11g

architectures using HP Continuous Access

EVA and HP Insight Recovery

HP Reference Architectures for Oracle

Technical white paper

Table of contents

Executive summary ............................................................................................................................... 3

Proof-of-concept description (POC) ......................................................................................................... 4 Disclaimer ....................................................................................................................................... 4

Terminology ........................................................................................................................................ 5 Oracle disaster recovery terminology ................................................................................................. 5 HP Continuous Access EVA terminology .............................................................................................. 5 HP Insight software terminology ......................................................................................................... 6

HP BladeSystem reference architecture with Insight Recovery extension overview ........................................ 6

Overview of Insight software: Insight Dynamics, logical servers and Insight Recovery ................................... 7 The logical server ............................................................................................................................. 7 HP Insight Recovery .......................................................................................................................... 8 HP Continuous Access EVA ............................................................................................................... 9

Architecture summary ......................................................................................................................... 11 Solution attributes ........................................................................................................................... 11 HP Insight Recovery solution highlights .............................................................................................. 11 Functionality .................................................................................................................................. 11

SOA disaster recovery logical topology diagram ................................................................................... 12 Why disaster recovery for middleware SOA ..................................................................................... 12 High Availability (HA) .................................................................................................................... 12

POC hardware components ................................................................................................................ 14

POC software components .................................................................................................................. 15

POC version details of BladeSystem infrastructure .................................................................................. 16

Infrastructure diagram ........................................................................................................................ 16

Oracle database and middleware ....................................................................................................... 18 Oracle Fusion Middleware .............................................................................................................. 18 Oracle RAC database .................................................................................................................... 18

POC site configuration ....................................................................................................................... 19

Planning DR groups ........................................................................................................................... 24

Installation and configuration of Insight software and Insight Recovery ..................................................... 24

Test case........................................................................................................................................... 26 The application test environment ...................................................................................................... 26

Network and DNS configuration for disaster recovery ............................................................................ 30

Switchover (planned failover) .............................................................................................................. 31 Switch back procedures .................................................................................................................. 32

Best practices .................................................................................................................................... 32

Example of switchover using Insight Recovery during SOA transaction (XA transaction) .............................. 34

Unplanned failover ............................................................................................................................ 43

Comparison with Oracle Data Guard solution for database replication .................................................... 44

Summary .......................................................................................................................................... 45

Appendix A: Documents ..................................................................................................................... 46 EVA documentation ........................................................................................................................ 46 HP Insight software and Insight Recovery documentation ..................................................................... 46 Oracle documentation .................................................................................................................... 46

For more information .......................................................................................................................... 47

3

Executive summary

This document describes a disaster recovery solution for Oracle® environments based on synchronous

storage replication (across town) and HP server-edge virtualization that dramatically simplifies remote

site recovery. The solution uses HP Continuous Access EVA software for high performance array-

based replication of the Oracle database as well as the server OS, Java™ Message Service (JMS)

data, transaction log file (TLog), or metadata required for disaster recovery, and Oracle home. In the

event of a complete site failure, HP Insight Recovery (also called Insight Dynamics Recovery

management), when directed to do so by the system administrator, automates transition of the

production environment from the primary site to the standby recovery site leveraging HP Virtual

Connect technology to minimize system reconfiguration and time-to-recovery.

Although virtual machines are supported in this architecture, this paper is focused on a multi-tiered

physical server architecture typical of an Oracle environment that includes the key elements of Oracle

Fusion Middleware (OFMW) architecture: an Oracle Real Application Cluster (RAC) database,

Oracle WebLogic Server middleware, Oracle SOA Suite and Oracle HTTP Server (OHS) web hosts.

The Oracle SOA Suite consists of: Oracle Business Process Execution Language (BPEL) Process

Manager (PM), Mediator, Rules, B2B (Business-to-Business), Human Workflow, and Oracle Business

Activity Monitoring (BAM).

Oracle BPEL is an XML-based language for enabling task sharing across multiple enterprises using a

combination of Web services. BPEL is based on the XML schema, Simple Object Access Protocol

(SOAP), and Web Services Description Language (WSDL).

Oracle BPEL Process Manager provides a framework for easily designing, deploying, monitoring, and

administering processes based on BPEL standards.

Oracle BAM provides a framework for creating dashboards that display real-time data inflow and

creating rules to send alerts under specified conditions.

This Oracle foundation will support custom and packaged applications or Service-Oriented

Architecture (SOA) composite applications in a highly available and flexible way. The exact Oracle

environment is not critical to implementing the recovery scenario, a single instance database or

different middleware configuration would work as well. The basic concepts and design of the

recovery environment would be the same. The point is that there is more than just database content

that needs to be replicated and safeguarded to allow seamless disaster recovery.

This paper discusses how to setup and configure this recovery architecture and provides an

introduction to the various hardware and software components needed. References are provided for

more detailed information.

This white paper is part of a portfolio of information focused on the optimal integration of HP

Converged Infrastructure technologies with Oracle software technologies. As a collection, we refer to

this documentation as the HP reference architectures for Oracle. Additional reference architecture

documentation can be accessed through the HP and Oracle Alliance home page at

www.hp.com/go/oracle.

Target audience: This paper is written for system architects, managers, and others involved in, or with

a need to understand, the definition and deployment of highly available and disaster-tolerant Service-

Oriented Architecture (SOA) computer solutions.

This white paper describes testing performed in March to May 2010.

http://www.hp.com/go/oracle

4

Proof-of-concept description (POC)

Oracle applications tend to be among the most mission critical IT services. Customers require a full

range of availability options for these environments. The recommendation here is for a full three-tiered

architecture with database RAC 11gR2 in the back-end, Oracle WebLogic Server and SOA tiered in

the middleware, and web host processing at the front-end. A demonstration application designed by

Oracle is used to validate the data consistency and recovery in our configuration. The purpose of this

proof-of-concept testing was to validate one disaster recovery option that:

1. Leverages the same HP BladeSystem infrastructure and management console used for the single-

site HP Reference Architecture for Oracle Grid.

2. Provides a cost-effective alternative to geographic cluster architectures for the majority of customers

who do not require online access at the secondary site.

3. Provide an ―add-on‖ to the single-site HP Reference Architecture for Oracle Grid without

re-architecting.

The following document link provides more details about the above concepts:

http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA2-4214ENW

To validate the concepts and integration details, a small scale POC configuration was deployed in the

HP Alliances, Performance and Solutions labs to validate the expected functionality of the key

components and identify any Oracle-specific best practices. Testing was completed for:

System level compatibility of technologies, version requirements, etc.

Site failover functionality specific to all tiers of the Oracle environment and business process

workflows

Virtual Connect (VC) functionality of the single-site HP Reference Architecture for Oracle Grid to

add, remove, or replace a server after fail-over

To provide context for the reader, the details of the POC configuration are used throughout this white

paper. It is important to understand that this particular implementation is simply an example used for

expedience. The concepts described here can be scaled to much larger systems and capacity.

Disclaimer

The performance of a disaster recovery system has to be evaluated on a case-by-case basis based on

the needs of the customer. This white paper only provides template demonstration examples. This

proof-of-concept does not include performance characterization because performance issues/

requirements are unique to every application. This white paper is not a step-by-step ―how to‖ or

―install guide‖; rather, it describes a recommended architecture and the key integration points with

Oracle software.

In our work we made every effort to ensure that what we implemented is fully supported by both HP

and Oracle. This white paper is not intended to imply functionality, compatibility or supportability

beyond what is documented in the individual product specifications.

http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA2-4214ENW

5

Terminology

Oracle disaster recovery terminology

Disaster Recovery – The ability to safeguard against natural disasters or unplanned outages at a

production site by having a recovery strategy for failing over applications and data to a

geographically separate standby site.

Oracle Fusion Middleware (OFMW) – A collection of standards-based software products that spans a

range of tools and services from Java EE (enterprise edition) and developer tools, to integration

services, business intelligence, and collaboration. Oracle Fusion Middleware offers complete support

for development, deployment, and management.

Oracle Service-Oriented Architecture (SOA) Suite – A suite with infrastructure components, such as

Oracle Business Process Execution Language (BPEL) Process Manager (PM), with Mediator, Rules,

B2B, and Human Workflow.

Recovery Point Objective (RPO) – The maximum age of the data to be restored in the event of a

disaster. For example, if the user‘s RPO is six hours, RPO is the ability to restore systems back to the

state they were in as of no longer than six hours ago.

Recovery Time Objective (RTO) – The time needed to recover from a disaster—usually determined by

how long you could afford to be without your systems.

Site failover – The process of making the current recovery (standby) site the new primary site

(production), after the primary or production site becomes unexpectedly unavailable (for example,

due to a disaster at the production site).

Site switchover – The process of reversing the roles of the primary site and recovery site. Switchovers

are planned operations performed for periodic validation or planned maintenance on the current

production site. During a switchover, the current standby site becomes the new production site, and

the current production site becomes the new standby site.

Site switchback – The process of reversing the roles of the new production site (old standby) and new

standby site (old production). Switchback is applicable after a previous switchover.

WebLogic Server transaction logs – Since each WebLogic Server instance has a transaction log

that captures information about committed transactions that may not have completed, the transaction

logs enable the WebLogic Server to recover transactions that could not be completed before the

server failed.

HP Continuous Access EVA terminology

Disk Group – A named group of disks selected from all available disks in an array. One or more

virtual disks can be created from a disk group.

Data replication group (DR group) – A logical group of virtual disks in a remote replication

relationship with a corresponding group on another array.

Destination – The virtual disk, DR group, or virtual array disk (at the recovery site) to which I/O is

replicated. See also Source.

Enterprise Virtual Array (EVA) – An HP StorageWorks product that consists of one or more virtual

arrays. See also virtual array.

Fabric – A network of Fibre Channel switches or hubs and other devices.

HP Continuous Access EVA (CA) – A storage-based HP StorageWorks software product that enables

two or more arrays to perform disk-to-disk replication, along with the management user interfaces that

facilitate configuring, monitoring, and maintaining the replicating capabilities of the arrays.

6

Management server – A server on which HP StorageWorks Enterprise Virtual Array (EVA)

management software is installed, including HP StorageWorks Command View EVA and HP

StorageWorks Replication Solutions Manager, if used. A dedicated management server runs EVA

management software exclusively.

Present LUN – Process in which the Management Console of the storage makes the LUN or virtual disk

to be presented (made visible) to the World Wide ID (WWID) of the host (DB/middleware) server‘s

QLogic or Emulex HBA.

Source – The virtual disk, DR group, or virtual array (at the primary site) from which I/O is replicated

to the recovery site. See also destination.

XCS – The HP Enterprise Virtual Array software on specific EVA controller models. Controller software

manages all aspects of array operation, including communication with HP StorageWorks Command

View EVA.

HP Insight software terminology

HP Insight Recovery software (IR) – Enables execution of commands to perform a disaster recovery

solution (also called Insight Dynamics Recovery management).

HP Virtual Connect Enterprise Manager (VCEM) – Management tool that provides management of

multiple HP BladeSystem enclosures equipped with Virtual Connect modules.

Logical Server Profile – A logical server profile is composed of system services and resources whether

these are virtual, physical, shared or unshared – everything that the OS and application stack for a

given workload requires to operate.

Virtual Connect (VC) – HP VC switches plug into the backplane of an HP BladeSystem c7000

enclosure, providing Ethernet and FC switching capabilities with a virtual MAC and WWID.

HP BladeSystem reference architecture with Insight Recovery

extension overview

The single-site HP Reference Architecture for Oracle Grid is the foundation for the solution described

here. This architecture is fully redundant with no single point-of-failure. As previously described, it is a

three-tiered architecture of web hosts, Fusion Middleware and database servers. It is designed to

support an Oracle Real Application Cluster database of two or more nodes with multiple Oracle

middleware and web hosts in a scale-out configuration. Server and LAN connections, including the

dedicated RAC interconnect, are 10Gb Ethernet. Storage is a shared HP StorageWorks Enterprise

Virtual Array which hosts the database, system images and all stored elements including the HP

logical server profiles we explain below. All servers in the environment boot from this shared storage.

Ethernet and Fibre Channel storage connections are made through HP Virtual Connect (VC).

With this environment, server identities (profiles) are abstracted from the physical hardware (see The

logical server section below), making servers completely interchangeable. This means server

replacement can be handled with little to no human intervention. It also means that Oracle servers can

be pre-defined (saved to shared storage) and provisioned in minutes versus days. This model provides

opportunities for dramatic utilization improvement. For example, additional RAC nodes and

application servers could be pre-defined and used to increase capacity at month end for financial

systems or to spin up a test environment on-demand. The single-site reference architecture documents

this model in detail. For further information, see the references at the end of this paper.

Our purpose in this paper is to describe an extension to the single-site reference architecture that

provides for disaster recovery from a complete site outage. The solution starts by assuming mirrored

single-site reference architectures are geographically separated but latencies are reasonable enough

7

to allow a synchronous connection with a fibre channel cable between the two EVAs located at the

two sites. One site is the primary production site, the other a recovery environment that can take over

production processing in a matter of minutes. HP Continuous Access EVA software is used to replicate

the shared storage, which includes everything needed to make the recovery. HP Insight Recovery

automates the migration of services from the production to recovery site when an administrator

determines it is necessary.

Overview of Insight software: Insight Dynamics, logical

servers and Insight Recovery

In our disaster recovery architecture, HP Insight Dynamics software provides the resource

management framework. A core concept of Insight Dynamics is the ―logical server.‖ In the context of

our solution, a logical server is a management abstraction that simplifies and optimizes the

provisioning and re-provisioning of servers. Because a logical server is abstracted from the underlying

platform, it makes those underlying resources anonymous to the application/OS stack. A logical

server can be created from a discrete physical server, from within a pool of physical resources, or

from a virtual machine. HP Insight Dynamics software uses the concept of logical servers to deliver a

common framework for planning, deploying and managing both physical and virtual servers

seamlessly.

The logical server

Logical servers bring the freedom and flexibility of virtualization to physical servers. The logical server

is a server profile that is easily created and freely moved across physical and virtual machines. By

detaching the logical identity from the physical resource, you can create or move logical servers on

any suitable virtual or physical machine—on demand.

With a logical server approach, you can even create templates for your frequently used applications

with specific configurations. These templates can be stored and reactivated in minutes, when needed.

A logical server profile describes an abstracted system image (including the system services and

resources), whether these are virtual, physical, shared, or unshared. The system image includes

everything that the OS and application stack require to operate on a particular workload. For

example, a logical server profile would include attributes describing entitlements such as power

allocation, processor and memory requirements, PCI Express devices (local I/O), network connections

(distributed I/O), and storage. The logical server is managed in software. This could be software local

to the platform as firmware integrated into the hardware or software on a centralized management

server (CMS).

For our solution, logical server profiles will be created for all the servers used in the Oracle

environment. Not all applications are well-suited for virtual machines today, particularly those with

high I/O or deterministic latency requirements like an Oracle database, particularly a RAC database.

Therefore, in this disaster recovery solution we will use only logical server profiles of dedicated

physical servers. These profiles enable rip and replace server recovery in minutes with no network or

operating system reconfiguration; this functionality is fully documented in the single-site HP Reference

Architecture for Oracle Grid. For our disaster recovery solution, we will extend these concepts to

remote site fail-over.

8

HP Insight Recovery

HP Insight Recovery (IR), also called Insight Dynamics Recovery management, provides disaster

recovery protection for logical servers configured and managed by Insight Dynamics. Logical servers

that are managed by Insight Recovery are referred to as Disaster Recovery Protected (DR Protected)

logical servers. Each DR Protected logical server is configured to run on an HP ProLiant server, either

a c-Class blade equipped with HP VC, or on a virtual machine.

An IR configuration consists of two sites, each running Insight Dynamics software and IR. At any

point in time, one site is configured with the primary site role and the other is configured with the

recovery site role. IR pairs symmetrically configured logical servers across the two sites. The DR

Protected logical servers at the primary site are in an activated state, providing services to the end-

user. The peer logical servers at the recovery site are in a deactivated state. For our Oracle solution,

this means that the database and Oracle middleware are only running at the primary site. The

physical servers standing by at the recovery site cannot be running the same Oracle logical servers

but could be running another workload, such as a test environment for the Oracle application. In the

event of a failure at the primary site, the recovery physical servers would need to be de-provisioned of

their test environment before failover could be initiated. This is a simple process handled routinely by

Insight software.

At the primary site, the boot images of the DR Protected logical servers―including the operating

system, applications code, and data―reside on HP StorageWorks EVA array volumes. The primary

site volumes are replicated to an EVA array at the recovery site. The primary and recovery site arrays

are synchronized with Continuous Access EVA. Each replicated recovery site volume is associated

with a DR Protected peer logical server at the recovery site. The combination of a DR Protected logical

server and its associated storage volume is referred to as a recovery group.

If a disaster occurs at the primary site, the administrator at the recovery site can trigger a site failover

via a push-button provided by IR. This action will fail all of the recovery groups over to the recovery

site. For each recovery group, this involves preparing its storage volume for read-write access and

activating its associated logical server. After all of the recovery groups are failed over, the role at the

recovery site is changed to the primary site.

For complete details on the Insight Recovery product (also called Insight Dynamics Recovery

management), its requirements and design considerations, see the home page at

www.hp.com/go/insightrecovery.

Figure 1 below shows the conceptual diagram of the IR solution, where the green blocks (―A‖, ―B‖,

etc.) represent the OS and database LUNs on the EVA shared storage. The blade servers are part of

the VC domain group which can be failed over.

http://www.hp.com/go/insightrecovery

9

Figure 1. Insight Recovery Solution

HP Continuous Access EVA

At the core of the Insight Recovery solution is the storage replication between two sites using the HP

StorageWorks Continuous Access EVA (CA) software, which is an array-based application that uses

advanced replication technologies to replicate data over distances between EVAs. CA utilizes a

simple graphical user interface (GUI) to create, manage and configure remote replication on the

entire EVA family of storage arrays. Furthermore, Continuous Access EVA software provides the

necessary components to achieve an enterprise‘s business continuity objectives in a cost effective and

easily deployable package.

In our disaster recovery architecture, we will use CA EVA in synchronous replication mode so that

every update is posted to both the local and remote arrays simultaneously, ensuring complete

recovery in the event of a site failure. In synchronous write mode, the source array acknowledges I/O

completion only after replicating the data on the destination array. Synchronous replication prioritizes

data currency over response time.

1. A source array controller receives data from a host and stores it in cache.

2. The source array controller replicates the data to the destination array controller.

3. The destination array controller stores the data in cache and acknowledges I/O completion

to the source controller.

4. The source array controller acknowledges I/O completion to the host.

Replication will be maintained for the Oracle database as well as the server boot LUNs stored on

the EVA.

Planning for inter-site latency and bandwidth are critical to ensure service levels to end users. For

complete details on planning and implementing remote replication see the Continuous Access home

page at www.hp.com/go/caeva. The following link provides some of the best practice white papers

and latency measurements for EVA, http://www.hp.com/go/hpcft.

http://www.hp.com/go/caeva

http://www.hp.com/go/hpcft

10

Figure 2 below shows the SAN connectivity from the VC switches to the EVA storage and a disaster

recovery connection to the recovery site.

Figure 2. EVA CA diagram with HA

SAN

Switches

UID

ESC ENTER

UID

ESC ENTER

HPStorageWorks

hsv210

HPStorageWorks

hsv210

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

hp StorageWorks

Bay 14Bay 1

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

300G

B 15k

54 6 70 1 2 3

1312 14 1598 1110

!

HP StorageWorks SAN Switch 4/16

SPDLNK

54 6 70 1 2 3

1312 14 1598 1110

!

HP StorageWorks SAN Switch 4/16

SPDLNK

PRIMARY SITE

SAN

HP 4Gb VC-FC Module

UID

1 2 3 4

HP 4Gb VC-FC Module

UID

1 2 3 4

ISL

EVA

Management

Alternate EVA

Management

for HA

Dual EVA

Controllers

Connecting to the

Remote Site for

EVA CA

Remote Site

mirrored as in

Primary site

Flex-10 VC Ethernet

switches and

VC Fibre Channel switches

in the backplane of the

c7000 enclosure

Blades connected to

VC Flex-10 and FC

switches thru

backplane

11

Architecture summary

Solution attributes

All servers boot from LUNs on the shared EVA storage.

Logical server profiles for each physical server are defined identically at both sites.

Synchronous storage-array based replication over a standard Fibre Channel SAN (site separation

limited to Fibre Channel distances)

Standby site with identical hardware configuration as primary site

HP Insight Recovery solution highlights

The OS and Oracle homes for RAC, WebLogic, SOA and the web hosts reside on the EVA storage as

a separate LUN for each host. The main advantage of this is that any changes to the OS, such as

kernel parameters, tuning, updating OS packages, versions, etc. are automatically copied onto the

secondary site via CA. Oracle software changes in configuration, log files like JMS, TLog, and any

patch, or Java tuning applied to WebLogic or managed servers are automatically copied over to the

secondary site. This helps maintain a pristine production environment ready to come up any time on

the secondary site.

The linking of the above LUNs to the logical server using Insight Recovery makes the solution seamless

as all the parameters of the servers―such as CPU, memory, both SAN and LAN networks―are

maintained in a logical server. Whenever disaster recovery is activated, IR automatically scans for

hardware with the same server attributes and then starts up with the LUN that was already replicated

to the local storage.

A combination of the above two factors ensures high reliability and availability of a disaster recovery

solution ready to be activated when necessary. This provides all the features required for ensuring a

complex setup like Oracle database and middleware to come up smoothly without any issues on the

recovery site. This feature will be further demonstrated during the planned failover section in this POC.

Functionality

Servers at the standby site can be used for other purposes and quickly reconfigured for failover

using logical server profiles and the replicated boot LUNs from the primary site EVA.

Continuous Access EVA replicates the Oracle database and all metadata needed for complete

system recovery.

Continuous Access EVA replicates the JMS, TLogs and any Oracle software, patches or changes in

the middleware.

In the event of a disaster the system administrator initiates re-configuration of the standby site via

Insight Recovery. The standby recovery site servers have the Oracle environment server profiles

applied and boot up in their new roles as the production system.

12

SOA disaster recovery logical topology diagram

Figure 3 shows the logical diagram for having a SOA-based disaster recovery configuration; it

represents clustering at the SOA level, WebLogic Server (WLS) and Oracle Web Services Manager

(OWSM). There are multiple web hosts for the OHS servers. The backend is replicated at the

database and middleware. The next section maps the topology into different components of HP

hardware and software to enable a working disaster recovery solution.

Why disaster recovery for middleware SOA

Critical business services may need 24/7 availability and SOA applications, in particular, have

unique availability requirements and SLA compliance has many hardware and software challenges.

Data availability is as important as the service availability and the service available is dependent on

the entire infrastructure. A scalable, highly available infrastructure is required to run your services.

Oracle Fusion Middleware processes XA transactions which are stateful and write to the JMS queue

and database. As seen later in this paper, in a test case with in-flight SOA transactions when the site

failover occurs, the solution protects any number of transactions in flight to be processed.

In addition to the above state persistence requirement, the following requirements are also necessary

to perform a successful disaster recovery between two sites. The Oracle home for the middleware has

to be maintained across the two sites in the same state for the SOA middleware to come up smoothly.

New composite applications installed or web services deployed or OS kernel changes, or patches to

Oracle or OS, are maintained by CA. Overall, the disaster recovery solution is an extension to the

HA capability of the solution. Customers determine the level of HA and disaster recovery solution

necessary for their business.

High Availability (HA)

The disaster recovery topology shows the HA aspect of the solution by having two RAC nodes, two

SOA-managed servers and at least two web hosts. HA includes the clustering, state replication and

routing, failover, load balancing, server migration and server load balancing components. Redundant

network and SAN paths and corresponding switches are also required.

To tailor the solution to your needs, the exact level of HA and disaster recovery components needs to

be determined based on the SLA requirements and other requirements of your business.

Figure 3 below shows the logical diagram representation.

13

Figure 3. Logical SOA disaster recovery solution

App Host #1

WLS

SOA

SOA-Infra

App Host #2

SOA Cluster

WLS SOA

SOA-InfraWLS SOA

SOA-Infra

Web Host #1

OHS

Load

Balancer

Admin Host

WLS

Admin Server

DNS Server

FMW

FMW Control

Web Host #2

OHS

RAC DB HP StorageWorks EVA

Disk Array

Data

Data

EVA Storage

Host #1

Host #2

App Host #1

WLS

SOA

SOA-Infra

App Host #2

SOA Cluster

WLS SOA

SOA-InfraWLS SOA

SOA-Infra

Web Host #1

OHS

Load

Balancer

Admin Host

WLS

Admin Server

DNS Server

FMW

FMW Control

Web Host #2

OHS

RAC DB HP StorageWorks EVA

Disk Array

Data

Data

EVA Storage

Host #1

Host #2

Client

Middleware -Tier

Continuous Access

Synchronization

Database -Tier

Continuous Access Sync

Production Site Standby Site

WAN

OHS = Oracle HTTP Server

14

POC hardware components

Table 1. POC hardware components

Quantity Description

2 c7000 Enclosures HP BladeSystem c7000 enclosure holds up to 8 HP ProLiant BL685c Blade Servers. One

enclosure per each site for disaster recovery, here called primary and recovery sites.

2 per enclosure HP ProLiant G6 Blade Servers used for database

1 spare for local failover of

database per enclosure HP ProLiant G6 Blade Server used only if any of the above fails (spare DB blade)

4 per enclosure HP ProLiant G5 Blade Servers used for middleware WebLogic and web hosts

1 spare for local failover of

middleware per enclosure HP ProLiant G5 Blade Server used for middleware in case of failover

1 per blade server

(8 per site in this setup)

QLogic Fibre Channel mezzanine card dual port configuration QMH2462 4Gb for HP

c-Class BladeSystem. If you use Emulex cards on one site, use the same type of cards on the

other site.

2 per enclosure VC Flex-10 Ethernet module

2 per enclosure HP 4Gb VC FC module

2 per site for HA HP StorageWorks SAN Switch 4/16

2

HP StorageWorks Enterprise Virtual Array, EVA6400 (one per site)

(Includes 4U Controller assembly with 2 HSV400 controllers and 1 DL380 EVA

Management station)

2 per site

ProCurve switches

Flex-10 supported Fibre and Ethernet supported ProCurve switches

HP ProCurve 2910al-24G Switch (J9145A)

GBICs Flex10 -supported GBICs

as needed Shortwave fibre cables and Cat5 network cables

2 DL380 G5 or above with at least 8GB memory and 146GB hard disk. One for each site;

required for Insight software

2 DNS servers required for primary and recovery sites. (Assumed to be in place under the

existing infrastructure.)

2 F5 BIG-IP Load Balancer (Local Traffic Manager)

1 DL 380 G5 as a Client Access Machine

2 F5 Global traffic manager. This facilitates the client connections transfer from primary to

recovery site smoothly. This is optional, can be done alternatively using DNS.

These are only the representative components we chose to use in this POC. Most currently shipping

ProLiant G5 or later servers and EVA models would work as well. Also, much larger configurations

with up to 50 servers and terabytes of storage are possible. For details, limitations, and requirements,

see the Continuous Access EVA and Insight Recovery support specifications.

15

POC software components

Table 2. POC software components

Software Description

Red Hat Enterprise

Linux Version 5.3 on HP ProLiant blade servers used for database and middleware

Oracle Database Version 11gR2 on HP ProLiant G6 blades

Oracle

Middleware

Version 11gR2 on HP ProLiant G5 blades, including WebLogic admin server, managed server and SOA

configuration as described in Oracle Enterprise Deployment Guide

Insight Software

Version 6.0 on DL380 G5 server. This is a comprehensive management software which includes the

following, accessible under one web interface:

HP Version Control 6.0

HP Insight Control 6.0, which includes:

HP Insight Control licensing and reports 6.0 (new)

HP Insight Control performance management 6.0 (updated)

HP Insight Control power management 6.0 (updated)

HP Insight Control server deployment 6.0 and 6.0.2 patch (updated)

HP Insight Control server migration 6.0 (updated; new in Insight Control)

HP Insight Control virtual machine management 6.0 (updated)

HP Insight managed system setup wizard 6.0.1

HP Insight Software Advisor 6.0

HP Virtual Connect Enterprise Manager 6.0

HP Insight Dynamics 6.0.1, which includes:

Capacity planning

Configuration management

Recovery management (Insight Recovery)

Infrastructure orchestration

HP Insight Capacity Advisor Consolidation software 6.0

EVA Command

View Version 9.2

EVA Controller

firmware XCS v9.5 or later

EVA Continuous

Access CA license for each EVA6400; required for replication

16

POC version details of BladeSystem infrastructure

The items in Table 3 typically come along with the c-Class enclosure, or you can update with the latest

firmware available on the HP website. Table 3 lists the versions used in this setup.

Table 3. POC version details

Software Description

Active Onboard Administrator Version 2.60

HP Integrated Lights-Out 2 (iLO 2)

for each HP ProLiant G6 blade Version 1.78

iLO 2 for each HP ProLiant G5 blade Version 1.30

HP VC Flex-10 Ethernet Module Version 2.31

HP 4Gb VC-FC Module Version 1.40

Infrastructure diagram

Table 1 listed the different hardware components used for the POC. Figure 4 below maps these

components into an actual infrastructure diagram and represents the hardware diagram of the POC

setup. Note that in the interest of simplifying our proof of concept, we eliminated certain redundant

components which would be necessary to ensure high availability for both primary and recovery sites.

Two load balancers would be required for high availability for each site; similarly, two ProCurve and

two SAN switches would be required for each site, as indicated in the POC hardware components. A

general guideline to the hardware infrastructure is provided in this diagram.

Evaluate your business requirements to determine an appropriate combination of HA and disaster

recovery for your enterprise.

17

Figure 4. Hardware representation of disaster recovery solution

Client running on DL380

BIG-IP Model 1500 BIG-IP Model 1500

HP ProCurve Switches

2 for HA

Insight Software with

Insight Recovery

ProLiant BL685c

blades for

WebLogic and

SOA Server and

web hosts

Virtual Connect

Flex-10 Ethernet

Technology for

Seamless Failover,

Addition or Rip and

Replace

VC Fibre Channel

Boot From SAN

EVA Continuous Access for

Sync of Middleware and

database along with OS

EVA6400 on each site

Insight Software with

Insight Recovery

HP ProCurve Switches Flex-10

Ethernet Switch

Power

Fault

Console

ProCurve

Locator

Mdl RPS Status of the Back

Test

Fan

Tmp

Status

Reset Clear

Usr

*

FDx

Spd

Act

LEDMode

Auxiliary Port

Networking by HPProCurve

Dual-Personality Ports: 10/100/1000-T (T) or SFP (S)

Use o

nly

one

(T o

r S) fo

r each P

ort2422Link Mode

21Link Mode 2319

20

17

18

15

16

Link Mode

Link Mode

13

14

7

8

5

6

3

4

Link Mode

Link Mode

1

2

9

10

11

12 22T 24T

23T21T

Ports ( 1 - 24T ) – Ports are Auto-MDIX* Spd Mode:

2 flash = 100 Mbps

off = 10 Mbps

on = 1 Gbps

3 flash = 10 Gbps

Switch

HP ProCurve

J9145A

2910al-24G

Power

Fault

Console

ProCurve

Locator


Test

Fan

Tmp

Status

Reset Clear

Usr

*

FDx

Spd

Act

LEDMode

Auxiliary Port



Use o

nly

one

(T o

r S) fo

r each P

ort2422Link Mode

21Link Mode 2319

20

17

18

15

16

Link Mode

Link Mode

13

14

7

8

5

6

3

4

Link Mode

Link Mode

1

2

9

10

11

12 22T 24T

23T21T


2 flash = 100 Mbps

off = 10 Mbps

on = 1 Gbps

3 flash = 10 Gbps

Switch

HP ProCurve

J9145A

2910al-24G

HP StorageWorks 4/16 SAN

Switches for each EVA

2 for HA on each site

ProLiant

BL685c G6

for

Database

Local DNS server Local DNS server

18

Oracle database and middleware

Oracle Fusion Middleware

Oracle Fusion Middleware 11g is a comprehensive family of products that are seamlessly integrated

to help create, run, and manage agile and intelligent business. Fusion Middleware SOA Suite

provides a complete set of service infrastructure components for designing, deploying, and managing

composite applications. The suite enables services to be created, managed, and orchestrated into

composite applications and business processes. The components of the suite benefit from common

capabilities including a single deployment and management model and tooling, end-to-end security,

and unified metadata management. Oracle SOA Suite is unique in that it provides the following set of

integrated capabilities: messaging, service discovery, orchestration, activity monitoring, business

rules, events framework, web services management and security.

A few of the products from Oracle Fusion Middleware that are part of the POC configuration for

disaster recovery are:

Oracle WebLogic Server

Oracle JRockit JVM

Oracle SOA Suite

Oracle HTTP Server (OHS)

A few of the key components of the Oracle SOA Suite 11g are:

Oracle Service Bus

Oracle Complex Event Processing

Oracle Business Rules

Oracle Adapters

Oracle Business Activity Monitoring

Oracle B2B

Oracle BPEL Process Manager

Oracle Service Registry

Oracle User Messaging Service

Oracle Human Workflow

Oracle Mediator

Oracle RAC database

Oracle Real Application Cluster (RAC) supports the transparent deployment of a single database

across a cluster of servers, providing fault tolerance from hardware failures or planned outages.

RAC provides a high level of availability, scalability, and low-cost computing. RAC provides very

high availability for applications by removing the single point of failure with a single server. If a

node in the cluster fails, the database continues running on the remaining nodes. Individual nodes

can be shut down for maintenance while application users continue to work. Fast application

notification enables end-to-end lights-out recovery of applications and load balancing when a cluster

configuration changes.

Oracle RAC provides flexibility for scaling applications. To lower costs, clusters can be built from

standardized processing, storage, and network components. When additional processing power is

needed, simply add another server without taking users offline, providing horizontal scalability.

Applications never have to modify connections as you add or remove nodes in a cluster. Oracle

19

RAC 11gR2 introduces the single client access name (SCAN) to allow clients to connect to the RAC

database with a single address which includes failover and load balancing.

POC site configuration

As per the hardware components indicated in Table 1, each site has a set of blades for database,

middleware and web hosts in a c7000 enclosure. There are a number of steps required to arrive at a

fully configured site. The following list briefly explains each step. For detailed installation steps, refer

to other HP documents as listed in the Appendix A.

Step one. Create boot LUNs on storage.

On the EVA6400, create the boot LUNs required for each of the blade servers, namely the database

servers, middleware servers, and web hosts. In this setup, database servers were created with 200GB

each and the middleware servers and web hosts were created with 100GB each. The 11gR2 RAC

database is on the partition called ―RACDB‖; this includes the Cluster Ready Services (CRS),

Automatic Storage Management (ASM) and database partition as per 11gR2 requirements. Figure 5

below shows the LUN partitions created on the EVA6400 for this setup.

Figure 5. LUN partitions on EVA6400

20

Step 2. Create Virtual Connect profiles for each of the blade servers that need to be deployed. This

assigns a virtual MAC ID for each of the networks and virtual WWID for each of the SAN networks.

Figure 6 below shows the configured Virtual Connect profiles for site 1 (primary site) and the blade

servers they are assigned to. The bay number indicates which profile is associated with which

physical blade.

The blade servers are configured to boot from SAN, that is, local disks are not used to install the OS.

The OS is installed on LUNs on the EVA shared storage.

Figure 6. Site 1 configured VC profiles

Step three. Present LUNs from shared storage to the corresponding blade server VC profile. Install the

operating system (OS). Red Hat Linux is installed on each of the servers after the LUNs are presented.

21

Step four. Install Oracle Database 11gR2. Configure a two-RAC-node database. At the end of the

installation, verify that both nodes are up and running. Figure 7 below shows two RAC nodes up

and running.

Figure 7. Verification that RAC nodes are up

Step five. Install WebLogic middleware and SOA using the Oracle Enterprise Deployment Guide to

install the SOA configuration, which will result in having the WebLogic admin server and two

managed servers with SOA up and running. Figure 8 below shows the WebLogic admin console with

the status of all the components up and running.

Figure 8. Fusion Middleware Components

22

Step six. Verify through the Oracle Enterprise Manager that all the components are deployed. In this

configuration, SOA domain name ― irdomain ― is created with SOA-managed servers and deployed.

Figure 9. Verification that all components are deployed

Step seven. Install web hosts.

Figure 10. Web host installation

23

Step eight. Configure F5 load balancer (Local Traffic Manager) with virtual IP and forward it to

appropriate web hosts. The F5 load balancer virtual IP will be used by clients to connect.

Figure 11. Appropriate web hosts receive F5 load balancer

24

Planning DR groups

On the secondary or recovery site storage, LUNs are created by CA once the LUNs in the primary site

are configured to be part of DR groups. CA ensures similar LUNs are created on the recovery site and

available. Each of the RAC nodes can be put in one DR group and similarly one group each for the

SOA and web hosts as shown in Figure 12. If there is a requirement in your application that needs

I/O write order to be maintained between database and middleware nodes then they can be in the

same DR group. SOA itself does not need this as a requirement and it can be in separate DR groups.

The RAC database can be on a separate DR group or added to the RAC DR group. The advantage

of configuring the whole LUN with OS and Oracle Home in the DR group is that all the kernel

parameters and tuning done on the OS and Oracle Home―changes like new patches, updates or

new applications deployed―are replicated to the other side. The database is also replicated to the

secondary site.

Figure 12. Database replication to secondary site

Installation and configuration of Insight software and

Insight Recovery

A brief description is provided here; refer to Appendix A for document links to detailed installation

and configuration guides available on this subject. Install Insight 6.0 software, which has several

components, on a separate server on each of the sites. On the primary site, configure each of

the profiles as logical servers. Configure Insight Recovery on both sites; this associates a logical

server to a boot LUN. On site 2 (recovery site), create a logical server profile and leave it in a

deactivated state.

25

Refer to the Insight Recovery user and configuration guide for details of installing Insight software

and configuration of Insight Recovery. When a disaster recovery or failover happens from primary

to recovery site, the Insight Recovery software will activate these logical servers and associate the

corresponding LUNs. Since this is a cold standby type of scenario, the physical servers are required

only at the time of recovery. Figure 14 shows the logical servers configured for the primary site and

Figure 15 shows Insight Recovery configured on site 2, signifying that it is a recovery site.

Figure 14. Logical servers configured for the primary site

Figure 15. Insight Recovery configured on site 2

26

Test case

Oracle‘s Fusion order demo is a middleware SOA composite application running on the SOA

managed servers. There are also a number of services deployed to service the application, such as

credit service. The database has the appropriate tables uploaded for this application. Some of the

test cases include:

Oracle database

Ping Oracle database

Connectivity test

Oracle WebLogic

Ping WLS admin console

Validate WebLogic Managed Server startup and log

Oracle SOA Suite components

Connectivity test for Fusion Middleware control console

Connectivity test to worklist App

Connectivity test to SOA-Infra

Validate deployed applications

Ping deployed web service end-points

Log in to OFMW control and navigate to test SOA order booking composite application

Submit two new orders

Validate application transactions

Log in to Oracle Worklist application

Validate new orders and orders submitted at primary site

Approve one new order and an order submitted on primary site

Validate that orders continue to be processed

Log in to OFMW control and validate the status of the orders

Validate application transactions in between failover

Submit a large order which requires approval

Do a failover before the order is approved

The application test environment

The Oracle Order Booking demo was used to validate the middleware disaster recovery

configuration. It provides a reliable and consistent way to test and validate a number of Oracle

Fusion Middleware components and their configurations. It also provides consolidated test results and

applicable best practices for disaster recovery. Tests are initiated at the primary site followed by

manual fault injection, standby site failover, and then test execution at the standby site to assure the

transaction logs generated by JMS services can be consistently restored after failure by initiating an

XA transaction that writes messages to a JMS queue and database.

27

The test application also provides the SOA Order Booking Composite application to drive the test

scenario. This application is the main application for processing orders from Global Company. This

composite application demonstrates how services, both internal to an enterprise, and external at other

sites, can be integrated using the SOA architecture to create one cohesive ordering system. Figure 16

below shows different engines in Oracle Fusion Middleware and the tests are designed to exercise

these components.

Figure 16. OFMW components

The Order Booking Composite application utilizes the following Oracle SOA suite components:

Oracle Mediator

Oracle Business Process Execution Language (BPEL) Process Manager

Oracle Human Workflow (using a human task)

Oracle Business Rules

Oracle Messaging Service

28

The Order Booking Composite application uses BPEL to orchestrate all the existing services in the

enterprise for order fulfillment with the appropriate warehouse based on the business rules in the

process. The diagram in Figure 17 illustrates the workflow.

Figure 17. Order booking composite

29

The Order Booking Composite application uses Oracle SOA Suite components such as Mediator,

BPEL, Business Rules, Human Workflow and Adapters. It invokes web services in a defined flow

sequence. The web services are independent of each other and are generated in different ways.

The screenshot in Figure 18 from Oracle JDeveloper illustrates the component layout.

Figure 18. Fusion Order Booking component layout

The following table further elaborates these elements.

Table 4. Technology and techniques used

Project Technology and techniques used

CreditService ―Top-down‖ implementation of web services: starting with a WSDL file, use JDeveloper to generate

Java classes from the WSDL file

RapidService ―Bottom-up‖ implementation of web services: starting with Java classes, use JDeveloper to generate a

WSDL file, and JSR-181 Web Services Metadata annotations in the Java files

SelectManufacturer Simple asynchronous BPEL process with receive and invoke activities.

FulfillmentOrder

Routing services that use filters to check input data. The filters then direct the requests to the

appropriate targets including a JMS adapter.

Transformation rules to transform data appropriately for writing to databases and files. Database

adapters and file adapters perform the writes.

30

FusionOrderBooking

BPEL process to orchestrate a flow sequence that includes:

Other BPEL flows (the SelectManufacturer BPEL project)

Mediator project (the FulfillmentOrder project)

Oracle Business Rules with BPEL

Decision Service

Flow activity to send requests to RapidService and SelectManufacturer

Human task to set up a step that requires manual approval

Network and DNS configuration for disaster recovery

Oracle requires that the hostnames on both sites be the same and resolve to the same IP address in

order that when failover occurs different Oracle components come up smoothly. In addition the client

has to transfer its requests to the recovery site when the primary site is down.

To have the same hostnames and IP addresses at both the primary and recovery sites, the servers can

be configured on a private network on each site which is non routable. This will ensure that the

hostnames and IP addresses on both sites for Oracle are the same. Alternatively a local DNS server

can also provide the same names across both sites.

The global DNS server will resolve the name clients connect to. The load balancer creates a virtual IP

which is available for the clients to connect. For example, the load balancer virtual IP could be

mapped to http://<MySoaCompany.com>>: <port number>/console. This can be a single load

balancer or a host of several load balancers. The name resolves into the virtual IP address exported

by the load balancer. The load balancer further forwards the messages to a pool of web hosts

running OHS which are on the private network.

This web configuration is duplicated on the other site, so that a private network for the servers is

maintained. This will ensure that hostnames and IP addresses are the same across the two sites.

The DNS resolves the names of the primary and recovery site load balancer virtual IPs as a list of the

IP addresses to forward to. When there is a failure at the primary site and the virtual IP of the primary

site is down, any new requests from the client are automatically forwarded to the secondary site.

Another way to maintain the same IP address structure at both sites is to have the client configured

with primary and alternate DNS servers as primary and alternate sites. This also will route client

requests to the alternate site when the primary site is down. F5 also provides a software solution

using Global Transaction Manager (GTM), which allows client connections to be transferred to the

new site seamlessly.

31

Figure 19. Network Configuration

2 Flex-10 VC Ethernet switches located in the backplane of the c7000

enclosure. All blades are connected to the backplane switches.

Internet

Client

Primary DNS server

Power

Fault

Console

ProCurve

Locator


Test

Fan

Tmp

Status

Reset Clear

Usr

*

FDx

Spd

Act

LEDMode

Auxiliary Port



Use o

nly

one

(T o

r S) fo

r each P

ort2422Link Mode

21Link Mode 2319

20

17

18

15

16

Link Mode

Link Mode

13

14

7

8

5

6

3

4

Link Mode

Link Mode

1

2

9

10

11

12 22T 24T

23T21T


2 flash = 100 Mbps

off = 10 Mbps

on = 1 Gbps

3 flash = 10 Gbps

Switch

HP ProCurve

J9145A

2910al-24G

Power

Fault

Console

ProCurve

Locator


Test

Fan

Tmp

Status

Reset Clear

Usr

*

FDx

Spd

Act

LEDMode

Auxiliary Port



Use o

nly

one

(T o

r S) fo

r each P

ort2422Link Mode

21Link Mode 2319

20

17

18

15

16

Link Mode

Link Mode

13

14

7

8

5

6

3

4

Link Mode

Link Mode

1

2

9

10

11

12 22T 24T

23T21T


2 flash = 100 Mbps

off = 10 Mbps

on = 1 Gbps

3 flash = 10 Gbps

Switch

HP ProCurve

J9145A

2910al-24G

Example.soaCompany.com

HP Flex-10 10Gb VC-Ethernet Module HP Flex-10 10Gb VC-Ethernet Module

Insight

Software

HP ProCurve Switches

Connecting to one or

multiple load balancers

BIG-IP Model 1500

BIG-IP Model 1500

Standby

site

Similar

Config-

uration

Primary Site

Switchover (planned failover)

Switchover (planned failover) is done if there is an imminent threat to the primary site or for periodic

validation or planned maintenance. Once a switchover occurs, the current primary or production site

becomes the recovery site and the current recovery site becomes the new primary or production site.

The following steps need to be performed in sequence to do a planned failover. In the next section,

an example of a planned failover is detailed step-by-step with an XA transaction between failovers.

1. Shut down all the Oracle components like web hosts, Oracle Fusion Middleware and Oracle

Database either manually or using the corresponding management software.

2. Shut down the operating systems.

3. At the current primary or production site, using the Insight Recovery software, select ‗convert

primary to recovery site‘. The Insight software will deactivate the logical servers on this site.

4. At the current standby or the recovery site ensure that sufficient server resources are available to

bring up the application and database.

5. At the current standby site using the Insight Recovery software select ―change current site to

primary site.‖ This action will convert the EVA connected to this (new primary site) to be the source

and will activate the logical servers.

32

6. The OS on the secondary site will come up.

7. Verify that the hostnames and IP addresses are the same as in the original primary site.

8. Re-enable the IP addresses to the appropriate Ethernet ports, for example, if the racnode1 server

on the first site has eth0 address of 100.100.100.1, then enable this on the eth port 0 in the

racnode1 server on the newly migrated primary site. Repeat this on all the servers to make sure

that the IP addresses match with the original site. HP provides a Portable Images Network Tool

called PINT which maintains the IP addresses after a logical server move or migration. For more

details on PINT tool configuration refer to the documentation listed in Appendix A.

9. Restart the Oracle database servers, Oracle Fusion Middleware components like WebLogic

admin, SOA-managed servers and OHS web hosts.

10. Enable the virtual IP of the load balancer on the new primary site.

11. Verify with Oracle Enterprise Manager that all components of the Oracle Fusion Middleware are

up and running.

12. The DNS server on the new primary site now will direct all the requests to the newly enabled

virtual IP of the load balancer.

13. Use a browser from a client to test out the new site.

Switch back procedures

Repeat all the steps above to perform a switch back to the original production or primary site.

Best practices

For Oracle Fusion Middleware disaster recovery, Enhanced Asynchronous and Synchronous

replication write modes are recommended. The replication write mode can be set on specific DR

group(s) depending upon frequency of data change. The Asynchronous mode replication is not

recommended for Oracle Fusion Middleware. The TLogs and JMS log files can be replicated using

Enhanced Asynchronous mode so that the performance is not impacted.

Synchronous replication prevents any loss of data, however it also requires each write I/O to be

completed on the destination array before it is considered completed for the local array. Therefore

in an environment where there are a lot of write I/Os, synchronous replication is a potential drag

on performance.

Enhanced asynchronous replication is nearly as robust as synchronous replication. In enhanced

asynchronous replication, write I/Os do not have to be completed on the destination array before

they are marked as completed locally. At the same time, there is protection against data loss,

because each I/O is written to the local array and a DR group write history log before it is

considered to be complete. The write history log is written in the same order that the write I/Os are

written to the local array. As the I/Os are propagated to the destination array, they are removed

from the write history log, so the write history log is a sequential record of all write I/Os written to

the local array that have not yet been acknowledged to be completed on the destination array.

In the event of a failure while enhanced asynchronous write mode is being used, all pending write

I/Os are preserved in the write history log. In this scenario, one option is to simply wait until the

source array can be brought back up. If the failure is only temporary and can be corrected in a

short period of time, this is probably the best option, because it ensures that no data will be lost.

In the case where the failure is not temporary or the production environment needs to be brought

back online quickly, the customer will have to fail over the production site to the standby site. In

enhanced asynchronous write mode, this means that all pending write I/Os in the write history log

will be lost. The number of writes lost can be minimized if the writes are being processed quickly,

and the therefore the number of pending writes is low. The rate of write processing should be

estimated by customers when they are setting their RPO. The RPO is dependent on the bandwidth of

33

the inter-site link, which is in turn dependent on the distance between the arrays, the type of

interconnect, and other factors. Careful analysis of the application‘s write profile and the replication

link speed can determine what the worst case RPO will be for the solution. For complete details on

RPOs, bandwidth, and inter-site links, see the HP StorageWorks Continuous Access EVA

implementation guide (see the For more information section for link).

While it is possible that a failover using enhanced asynchronous write mode could result in zero

data loss if the write log is empty, enhanced asynchronous replication can never be guaranteed to

achieve that objective. Synchronous replication is the only way to guarantee zero data loss. An

Oracle Fusion Middleware environment does not generally require synchronous replication, the

JMS and TLog files can be put in enhance asynchronous replication mode so that the performance

is not affected.

In the event of failure in the middle of a large block transfer, the CA EVA synchronous replication

guarantees the I/O write order during replication to the other site. So this becomes a typical case

of how Oracle handles a local site failure due to power or any other reason when a database goes

down. The synchronous replication also ensures that any message that is written to the disk is

written on both sites. If the replication is enhanced asynchronous the replication is again ensured to

write I/O order protected due to the history log.

In addition to the CA EVA replication, it is recommended that the user maintain a daily backup of

the sites using HP StorageWorks Business Copy on each site. Additional features can be used, such

as snapclone, which takes a point-in-time physical copy of the virtual disk, and the Vsnap feature,

which takes a snapshot of the virtual disk. The EVA CA is a highly reliable and highly available

configuration. The user needs to create a procedure to complement the EVA CA replication with

other features of backup based on the risk profile of the business needs.

Choosing the size of your write history log

As noted, enhanced asynchronous replication is the preferred write mode for most parts of the Oracle

Fusion Middleware environment. For enhanced asynchronous write mode to work properly, the write

history log must be large enough to hold the entire write I/Os for a system that is under peak load.

This is extremely important, since a full write log results in a process called normalization, which will

force a synchronization of the source and destination arrays. Under peak load, a forced

normalization would have a very negative impact on performance. Another reason to set the size of

the write history log correctly from the point when the DR group is created is that changing the size

requires you to switch to synchronous, drain the write log, and then switch back to enhanced

asynchronous mode.

High Availability for persistent stores

The WebLogic application servers are usually clustered for high-availability. For high-availability of

the SOA Suite within a site, a persistent file-based store is used for the Java Message Services (JMS)

and Transaction Logs (TLogs). These files can use enhanced asynchronous mode, to take care of

latency issues.

Meeting SLA requirements and recovery time

The SLA requirements can consider a host of factors, such as:

Whether application availability needs to be 99.95% for highly critical applications to 99.5% for

lower criticality

Inter-site link bandwidth, and type of communication used whether dark fibre or Ethernet etc.

Cost of the solution

The RPO and RTO define the time taken to recovery and the point until which data is recovered.

Failover distances at metro level – typically less than 150 miles – can be on synchronous replication.

34

There is minimal data loss of less than 5 minutes, and recovery time within a few hours (2-5) to a few

days depending on the criticality of the application and SLA requirements.

For long distance recovery – greater than 150 miles – required to protect from major area wide

disasters, asynchronous replication is used. The recovery time can be anywhere between 2 hours to

20 days depending on SLA requirements, and data loss can be as low as 10 minutes to up to 2 days

depending on the criticality and SLA.

Example of switchover using Insight Recovery during SOA

transaction (XA transaction)

To verify a SOA transaction across two sites, the following test is run. An order is entered from the

primary site and when the order is awaiting approval, site 1 goes down. The setup is now recovered

at the recovery site, where the order is approved and the transaction is completed.

Step one. Verify that the Oracle RAC database, Oracle WebLogic Server and web hosts are up and

running and the Fusion order entry application is deployed. Initiate a small order and verify that it

completes successfully. Oracle Enterprise Manager shows the instance created for each of the orders

(see Figure 20).

Figure 20. Fusion Order booking application

Step two. Initiate an order that requires Human Workflow, that is, it requires approval before

the order is approved. Figure 21 shows initiating the order with appropriate values as stored

in the database.

35

Figure 21. Initiating the order

Step three. The fedexshipment table in the database is empty. An entry will be made once the order

is approved.

Figure 22. Entry created upon approval

36

Step four. The new order is shown in the Oracle Enterprise Manager with the new instance ID.

Figure 23. New order and instance ID

Step five. The order goes into the Human Workflow Oracle BPM worklist application for approval.

Here a human will have to approve the order, before it can be processed further.

Figure 24. Human Workflow worklist

37

Step six. At this time, the primary site is shut down without approval of the order, so that the recovery

site can process the remaining portion of the order. The web hosts and load balancers are shut

down first and then the SOA-managed servers, WebLogic admin server and finally the database is

shut down.

Figure 25. Primary site shutdown

Step seven. Using the Insight Recovery software, convert the primary site to the recovery site. The

Insight software will deactivate the logical servers on the primary site as shown in Figure 26 below.

This job will complete within 10 minutes of starting the job.

Figure 26. Deactivation of logical server

38

Figure 27. Deactivation completed

Step eight. Change the new site where the applications are failed over to the primary site using the

Insight Recovery software. This will trigger the job to do the failover, after accepting the action. The

job will activate the new site logical servers, and all the LUNs associated with logical servers are

made the new source in the shared storage. The failover of the LUNs from the primary to recovery site

is done automatically by the Insight software. After the activation of logical servers on the recovery

site the servers are started up.

Figure 28. Activation of new site logical server

39

Figure 29. Confirm the failover to initiate

40

Step nine. Insight Recovery starts up the new primary site servers. Before applications are started up, it

is important to configure the IP addresses to be the same as in the primary site. This configuration can

be done using the PINT tool provided by HP, or also can be scripted or done manually. Verify that the

hostname and IP address is the same as that in the primary site. Verify that all the shared disks are

visible, such as Oracle RAC database disk. Now the systems on the new primary site are ready for

the applications to be started up.

Figure 30. Servers started up on Recovery site

Step ten. Start up the applications, database, WebLogic admin server, SOA-managed server, and

web hosts. The hostname and IP addresses are the same on both sites. Verify that all the composites

and services that are deployed are up and running. JMS and TLog files are intact and will continue

with transactions where they left off. The CA feature of EVA ensures that the two sites are up-to-date

and in sync.

Step eleven. Startup the F5 load balancer on the new primary site and verify that the virtual IP address

is active and is load balancing between the two web hosts. This indicates that clients can now

connect to this new site.

41

Step twelve. Validate some basic steps such as the web services, and all the OFMW are started up

correctly and applications are deployed. Note that the process of bringing up the servers, OS, Oracle

RAC database and OFMW applications may take up to 30 minutes. Validate that all applications are

started up in the middleware Enterprise Manager before proceeding to open the access to the clients.

Figure 31. Verify that applications are deployed

42

Step thirteen. The clients can reconnect to the new virtual IP address at the secondary site.

Step fourteen. Approve the previously pending order to complete the transaction.

Figure 32. Transaction completion

Step fifteen. The instance which is processing the order completes and the trace flow of the

order is shown in Figure 33 below. An entry is made to the table indicating that the order has

been processed.

Figure 33. Trace flow of order

43

Figure 34. Confirmation of order processing

Unplanned failover

Power down on primary site

A power failure or other abrupt or catastrophic interruption of service at the primary site will result

in an unplanned failover. In this case on the recovery site, using the Insight Recovery software, the

admin or user will have to manually change the local site to the primary site. To start up on the

recovery site, the following steps need be done:

Step 1: Ensure that site 1 has failed so that there is no possibility of split brain configuration.

Step 2: Ensure that there are enough server resources to start up the logical servers.

Step 3: Verify that the local Command View server at the recovery site is up and running.

Step 4: Ensure that the recovery site is not in maintenance mode, that is, the maintenance mode is

set to false.

Step 5: Select ―Change local to Primary site‖ and confirm the change.

This changes the storage connected to the recovery site (current site) to be active and the logical

servers are activated. The servers are booted up and ready for starting up Oracle software.

44

Figure 35. Unplanned Failover

Since the synchronous updates are done between the two sites, all transactions are protected, and all

the JMS and TLog entries are saved. Once the previous primary site infrastructure is restored, the

logical servers on that site will have to be deactivated and the EVA CA will have to show the current

site is a destination site. Once the CA is re-established, it will ensure that the storage between the two

sites are synced up again.

Network is down on the Primary site and the inter-site link between two sites

is down

When the primary site is alive but unreachable due to a network service interruption, an administrator

has to determine that a failover to the other site is required, and an unplanned failover, as described

above, can be executed. Ensure that once failover is done to the recovery site, the primary site is

deactivated by switching it to a recovery site. The EVA storage on the primary site may still show its

LUNs as source, proper procedure needs to be followed to make sure that only one site is source

before turning the connectivity back on between the two sites.

Comparison with Oracle Data Guard solution for

database replication

Oracle Data Guard provides a disaster recovery solution for databases. The database can be

replicated using an Oracle Data Guard solution; however, EVA CA still needs to be used for the

replication of OS, ORACLE_HOME middleware and web hosts. Oracle Data Guard requires the

standby site to be up and in passive mode since the log files have to be applied on the secondary

site. A performance comparison is beyond the scope of this document.

45

Summary

This paper highlights some of the key HP technologies to build a disaster recovery configuration using

HP blade infrastructure, Virtual Connect technology, Continuous Access EVA and HP Insight Dynamics

software. These components together provide a highly reliable solution. It can be made HA-compliant

by adding sufficient duplication to the components on each site where required. The Insight Recovery

component of the Insight Dynamics software provides an easy methodology to automate the disaster

recovery solution combining the logical server concept and the CA technology of the EVA.

Oracle Database and Oracle Fusion Middleware (OFMW) is a complex set of software and

configuration and the POC demonstrates the ability to configure and build a complex solution using

HP technology.

46

Appendix A: Documents

EVA documentation

HP StorageWorks SAN Design Reference Guide: Provides information on distance configuration,

required parts and topology etc.

http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentType=SupportM

anual&locale=en_US&docIndexId=179911&taskId=101&prodTypeId=12169&prodSeriesId=40673

4

HP StorageWorks SAN manuals,

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c00403562/c00403562.pdf

HP Insight software and Insight Recovery documentation

HP Insight Dynamics manuals,

http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=

101&prodClassId=10008&contentType=SupportManual&docIndexId=64255&prodTypeId=18964&

prodSeriesId=4146132

HP Reference Architectures for Oracle Grid on the HP BladeSystem,

http://h71028.www7.hp.com/enterprise/cache/494866-0-0-0-121.html

HP Insight Software 6.0 Installation and Upgrade Release Notes,


HP Insight Software 6.0 Installation and Configuration Guide,


HP Insight Recovery 6.0 User Guide,


HP Insight Recovery Online Help


HP Portable Images Network Tool (PINT) details are in the last section of the Insight Control server

migration User Guide,

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02048550/c02048550.pdf

Oracle documentation

Oracle‘s Middleware Disaster Recovery Guide and Disaster Recovery Terminology,

http://download.oracle.com/docs/cd/E10291_01/core.1013/e12297/intro.htm#CHDHDAJF

Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite 11g Release 1,

http://download.oracle.com/docs/cd/E12839_01/core.1111/e12036/toc.htm

Oracle Fusion Middleware Documentation Library,

http://download.oracle.com/docs/cd/E12839_01/index.htm

http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentType=SupportManual&locale=en_US&docIndexId=179911&taskId=101&prodTypeId=12169&prodSeriesId=406734




http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=101&prodClassId=10008&contentType=SupportManual&docIndexId=64255&prodTypeId=18964&prodSeriesId=4146132



http://h71028.www7.hp.com/enterprise/cache/494866-0-0-0-121.html





http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02048550/c02048550.pdf

http://download.oracle.com/docs/cd/E10291_01/core.1013/e12297/intro.htm#CHDHDAJF

http://download.oracle.com/docs/cd/E12839_01/core.1111/e12036/toc.htm

http://download.oracle.com/docs/cd/E12839_01/index.htm

For more information

HP StorageWorks 6400/8400 Enterprise Virtual Array manuals,

http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=

101&contentType=SupportManual&docIndexId=64179&prodSeriesId=3900918&prodTypeId=1216

9

Configure DR Solution using HP EVA Continuous Access,

http://docs.hp.com/en/B7660-90019/ch04.html

Best Practices for HP Continuous Access and HP Cluster Extension with the HP EVA4400 in an

Exchange 2007 environment,

http://h20195.www2.hp.com/V2/getdocument.aspx?docname=4AA2-3244ENW.pdf

To help us improve our documents, please provide feedback at

http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html.

© Copyright 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Java is a US trademark of Sun Microsystems, Inc.

4AA2-1244ENW, Created June 2010; Updated August 2010, Rev. 1

http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=101&contentType=SupportManual&docIndexId=64179&prodSeriesId=3900918&prodTypeId=12169



http://docs.hp.com/en/B7660-90019/ch04.html

http://h20195.www2.hp.com/V2/getdocument.aspx?docname=4AA2-3244ENW.pdf

http://h20219.www2.hp.com/ActiveAnswers/us/en/solutions/technical_tools_feedback.html

http://www.hp.com/go/getconnected

4AA2-1244ENW

Documents

procurvepower

hpact fdx

contenttypesupportmanualamp

langenamp

hp procurve

statustmp

continuous

local traffic