Top Banner
Borland ® JDataStore high availability Fail-safe with increased scalability A Borland White Paper May 2005
24

Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Fail-safe with increased scalability

A Borland White Paper

May 2005

Page 2: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Contents

Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Architectural overview ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Mirror types .......................................................................................................................... 4 Leveraged solution................................................................................................................ 5 Synchronization .................................................................................................................... 5 Failover ................................................................................................................................. 6

The webbench benchmark .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Platform configuration .......................................................................................................... 8 Webbench configuration ....................................................................................................... 9

Installing JDataStore 7.03 ................................................................................................................ 9 Configuring the benchmark............................................................................................................ 10 Creating and loading the database .................................................................................................. 13 Running the benchmark.................................................................................................................. 13 Creating the mirrors........................................................................................................................ 14 Configuring the benchmark with mirroring.................................................................................... 19

Benchmark results............................................................................................................... 20

Scaling higher .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Mirror status monitoring .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Mirror performance monitoring .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Summary.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2

Page 3: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Introduction

Borland® JDataStore™ 7 introduced a new high availability feature (JDataStore HA) that

provides support for incremental backup, manual/automatic failover, and increased

scalability.

This document provides a wealth of information about the architecture of JDataStore HA and

how to configure and monitor a JDataStore HA system using ordinary, low-cost components.

The webbench sample database is used to show how to configure a JDataStore high-

availability system. In order to show the fault tolerance and scalability benefits of the

JDataStore HA solution, the webbench sample is executed with and without JDataStore HA

enabled.

Many performance and reliability improvements were made to JDataStore HA in the

JDataStore 7.03 maintenance release. The discussion and performance results of this white

paper are based on the usage of JDataStore version 7.03.

The key benefits of JDataStore HA covered in this document include:

• Ease of configuration. This configuration can be created with the execution of four SQL

statements.

• Ease of monitoring. The system can be monitored and tuned with relatively simple tools

provided with the JDataStore product and the operating systems used.

• Fault tolerance. Either manual or automatic failover can be employed.

• Scalability. The simple example in this paper shows that the transaction throughput can

be increased manifold by moving from a one-server system to a three-server system.

• Low-cost solution. Ordinary, low-cost hardware and software components are used to

build a system that is extremely fault tolerant while also providing a manifold increase in

transaction throughput.

3

Page 4: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Architectural overview

One of the most important areas of concern for any database application is eliminating single

points of failure. The High Availability Server uses database mirroring to ensure database

access in the face of either software or hardware failure. A secondary benefit of mirroring is

increased scalability for read-only transactions.

Mirror types

There are three mirror types that can be used by an application are primary, read-only, and

directory.

• The primary mirror is the only mirror type that can accept both read and write

transactions. Only one primary mirror at a time is allowed.

• There can be any number of read-only mirrors. Connections to these databases can

perform only read transactions. Read-only mirrors always provide a transactionally

consistent view of the primary mirror database. However, a read-only mirror database

might not reflect the most recent write transactions made against the primary mirror

database. Read-only mirrors can be synchronized with changes to the primary mirror

instantly, on a scheduled basis, or manually. Instant synchronization is required for

automatic failover. Scheduled and manual synchronization can be used for incremental

synchronization or backup.

• A directory mirror mirrors only the mirror configuration table and the other system tables

needed for security definition. They do not mirror the actual application tables in the

primary mirror. There can be any number of directory mirrors. The storage requirements

for a directory mirror database are very small, because they contain only the mirror table

and system security tables. Directory mirrors redirect read-only connection requests to

read-only mirrors. Writable connection requests are redirected to the primary mirror.

Another important benefit for Directory mirrors is that they provide load balancing for

read-only connection requests across all of the available read-only mirrors.

4

Page 5: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Leveraged solution

JDataStore HA technology heavily leverages existing subsystems. This contributes

significantly to the simplicity, reliability, and performance of the JDataStore HA solution.

(Competitive solutions often have completely separate database storage engines and

connectivity solutions for their HA product offerings.)

• The JDataStore database kernel uses the same log files used for transaction rollback and

crash recovery to incrementally update read-only mirror images of a database.

• The existing support in JDataStore for read-only transactions provides a transactionally

consistent view of the mirrored data while the mirror is being synchronized with contents

of more recent write transactions from the primary mirror.

• JDataStore also uses the same TCP/IP database connections used for general database

access to synchronize mirrored databases.

Synchronization

There are two steps to synchronizing a read-only mirror:

• Synchronizing the log files. This is the most important step of synchronization. The log

files contain every change made to a database. Once all of a transaction’s log records

have been transmitted to a read-only mirror, that transaction has been made durable for

both the primary and read-only mirror.

• Replaying the log files against the read-only mirror database. The primary benefit of this

action is to bring the transactional state of the read-only mirror to a more recent state

relative to the primary mirror. A secondary benefit is to allow log files to be dropped.

Once a log file has been replayed against all read-only mirrors, it can be dropped to free

disk space.

5

Page 6: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

There are configuration settings that can be used to automatically synchronize log files with

read-only mirrors and replay the synchronized log records against the read-only mirror’s

database.

• INSTANT_SYNCH mirror property. The primary mirror will not return from a commit

operation until a majority of read-only mirrors with INSTANT_SYNCH set to true

confirm that they have made all log records for the committed transaction durable.

• AUTO_FAIL_OVER mirror property. This is a superset of INSTANT_SYNCH. If true,

automatic failover can be initiated when the primary mirror fails.

• Synch mirror operation. This can be performed on request, or it can be scheduled as a

periodic operation. If the mirror has INSTANT_SYNCH or AUTO_FAIL_OVER set,

then this operation incrementally replays log records received since the last replay

operation. If INSTANT_SYNCH or AUTO_FAIL_OVER are not set, then the synch

operation will first need to incrementally copy over any of the more recent log records it

does not already have before replaying log records against the database.

Synchronizing directory mirrors is simpler. The directory mirrors only a small set of system

tables, not the application tables. INSTANT_SYNCH and AUTO_FAIL_OVER settings are

not allowed for directory mirrors.

Failover

Both manual and automatic failover are supported.

Two actions can trigger an automatic failover after the primary mirror fails. If a connection to

the primary was made to the primary before it failed, this connection can trigger an automatic

failover by calling the rollback method on the connection object. Using rollback to trigger the

failover operation is identical to how online transaction processing (OLTP) applications deal

with other failures such as lock manager deadlocks and timeouts.

The second action that can trigger a failover is connecting to a directory mirror. If the

connection request is not for a read-only connection, and the current primary is not accessible,

6

Page 7: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

the directory mirror will automatically trigger the failover operation to satisfy the request for a

writable connection.

Note that for any automatic failover request to succeed, a majority of AUTO_FAIL_OVER

mirrors must be accessible by the new primary candidate, and they must all agree that the old

primary mirror is no longer accessible. This prevents write transactions from being performed

against two primary mirrors, because a majority of AUTO_FAIL_OVER mirrors is required

for failover and transaction commit operations.

Unlike automatic failover, manual failover is performed only on request. Any read-only

mirror can become the primary mirror. This is useful when the computer the primary server is

executing on needs to be taken offline for system maintenance.

The webbench benchmark

The webbench benchmark is delivered as a JDataStore sample in the <JdataStore Install

directory>/samples/JdataStore/WebBench/src/com/borland/webbench subdirectory. There is a

JBuilder project in this directory that can be used to build and launch the sample.

The webbench benchmark is ideal for testing the capabilities of JDataStore HA. It uses a

database that has a prototypical order entry schema. The database schema and generated

contents are very similar to the industry-standard TPC-C database. The size of the database

created during the load operation is configurable in the Options dialog by specifying the Load

multiplier value. The default is 10, which generates a database with a size of about 92 MB. A

multiplier of 45 should generate a 1 GB database. Two transaction types can be run against

the database: ReadWrite and ReadOnly. The ReadWrite transaction is very similar to the

TPC-C new order transaction. Each transaction is comprised of about 50 select, insert, and

update SQL statements. The ReadOnly transaction is just a modified version of the ReadWrite

transaction that performs only select operations. There are two options for running these

transactions:

7

Page 8: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

• Number of threads. Each thread uses a separate connection.

• Duration in minutes. Each thread repetitively executes the specified transaction for this

duration.

The multiplier option allows for very large database testing. The ReadWrite transactions

allow for testing the performance impact on write transactions when mirroring is enabled. The

ReadOnly transactions can be used to show the scalability benefits of read-only mirrors as

they offload the primary mirror for read-only query requests.

Platform configuration

For all tests run, one computer (the bench4 host) will execute the benchmark against a

JDataStore Server running on another computer (bench1 host). When the benchmark is run

with mirroring enabled, the JDataStore Server on bench1 will be mirrored by two read-only

mirrors running on bench2 and bench3. To optimize network throughput, all four computers

are connected to the same 100 megabit network switch. This minimizes the impact of network

traffic external to the switch.

Ideally, the primary mirror and the two read-only mirrors would have identical performance

characteristics. If the primary mirror goes down, you want the same performance throughput

when one of the read-only mirrors becomes the primary after a failover. The two read-only

mirrors are identical. However, the primary mirror has a faster clock rate (1200 MHz vs. 930

MHz for the read-only mirrors).

All four computers have JDataStore version 7.03 installed.

8

Page 9: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Following is a summary of the configuration:

Purpose CPU RAM OS NIC

Primary mirror, hostname=bench1

1200 MHz Intel® Pentium® III

500 MB RedHat® Linux® release 2.4.21-27.0.2.EL #1

3Com® 3c905-tx 100 megabit

Read-only mirror, hostname=bench2

930 MHz Pentium III

500 MB RedHat Linux release 2.4.21-20.0.1.EL #1

3Com 3c905-tx 100 megabit

Read-only mirror, hostname=bench3

930 MHz Pentium III

500 MB RadHat Linux release 2.4.21-27.0.2.EL #1

3Com 3c905-tx 100 megabit

Execute benchmark, hostname=bench4

1700 mhz Pentium M

2 GB Microsoft® Windows® XP pro version 2002

Broadcom® 570x gigabit

Webbench configuration

Installing JDataStore 7.03 JDataStore 7.03 is installed on all servers used for the benchmark.

Because JBuilder is being used to run the webbench benchmark, it also must be updated to

use JDataStore 7.03. One of the following approaches can be used to make sure JDataStore

7.03 is used for the benchmark:

1) Copy the beandt.jar, dbtools.jar, dx.jar, jds.jar, jdsremote.jar, and jdsserver.jar to the

Jbuilder lib directory

2) Create a Jbuilder library that references the JDataStore 7.03 jars and include it in the

webbench project.

9

Page 10: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Before any tests can be run, all four servers must be launched on bench1, bench2, bench3, and

bench4 computers. If the JDataStore server is installed to run as a service on all the different

computers, it is already running. Otherwise, JdsServer must be launched on each machine

from the JdataStore bin directory (execute \JdataStore7\bin\JdsServer).

The timings provided for benchmark runs that use JDataStore HA use a separate computer for

each mirror. A JDataStore server is executing on each computer using the default port setting

of 2508. One mirror per computer maximizes fault tolerance and transaction throughput.

However, this same test can be performed with a single computer by launching multiple

JdsServers with different port assignments. JdsServer in the bin directory can be instructed to

use a different port by specifying the port command line option. For example, four servers

could be launched on the same computer with the following commands:

\JdataStore7\bin\JdsServer –port=2511

\JdataStore7\bin\JdsServer –port=2512

\JdataStore7\bin\JdsServer –port=2513

\JdataStore7\bin\JdsServer –port=2514

Configuring the benchmark After all four servers have been launched, the webbench benchmark needs to be configured.

Select the Bench|Options… menu option. You should see a dialog box like this:

1 0

Page 11: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Figure 1: Configuring the benchmark: Example

This benchmark is already has been configured to use the bench1 server. The initial database

is created as “webbenchtest” in the “/tmp/webbench” directory. It is a good practice to locate

your database in a separate directory. To improve performance, log files can be located on a

different disk drive. However, for this test, the database and its log files are in the same

directory.

The minimum cache size has been set to a value larger than the default. This is not a required

change but does increase the performance of the benchmark. All three mirrors have 512 MB

of memory and are dedicated database servers. It makes sense to take advantage of the

available memory for the database cache. Note that if you reserve too much memory for the

cache, performance can degrade. The Linux operating system manages memory allocations

between resident applications and its own disk cache. If the database server process becomes

too large, the Linux operating system might start paging out process memory, causing

performance to degrade. The Linux vmstat command displays blocks that are swapped in and

out. This is a simple way to check if the database server process size is too large. The default

1 1

Page 12: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

minimum cache size is 512 MB, which results in a 2 MB cache (512*4 KB block size =

2 MB). The minimum cache size has been set to 32768 for this benchmark, which results in a

128 MB cache. To have such a large cache, the maximum heap size for the JVM also must be

increased. This can be accomplished by changing the following two lines in <JDataStore

Install directory>/bin/JdsServer.config from:

vmparam -Xms128m

vmparam -Xmx128m

To:

vmparam –Xms256m

vmparam –Xmx256m

The load multiplier has been increased from a default of 10 to 45. As mentioned earlier, a load

factor of 45 produces a database of about 1 GB. Larger databases take longer to load, so if

you want to try things out, use the default load size of 10.

Use a remote connection Because multiple computers are being used for this test, a remote connection must be used. Local

connections perform faster because they allow the application and database server to execute in

the same process. Use remote connections to connect to mirrored databases.

Remote database connections typically should be used for systems configured for failover. You can

still use local connections (in process), but there are significant benefits to using a remote

connection instead:

1) If the primary mirror fails, applications executing on separate computers can fail over to

another mirror.

2) Directory mirrors can load balance read-only connection requests across multiple read-only

mirrors.

1 2

Page 13: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Creating and loading the database After configuring the initial database connection, the next step is to create the database. Do

this by selecting the Bench | Create Database menu. Once the database is created, it needs to

be populated with some data. Do this by selecting the Bench | Load Data menu.

Running the benchmark Two benchmark runs are made against both mirrored and unmirrored configurations. The first

run executes only read-write transactions. The second run executes both read-write and read-

only transactions.

Both benchmark configuration runs start in a similar state. The 1 GB database is saved off

after the initial load. Before the benchmark runs for each configuration, a fresh copy of this

saved-off database is restored. This way, both benchmark configurations start off with the

same database size and state. The servers and benchmark test also are restarted for both

configurations.

To run the read-write transactions, select the Bench | Run menu. This launches the following

dialog box:

Figure 2: Running the benchmark: Example

This run puts 16 threads in a loop for five minutes each executing ReadWrite transactions.

Each thread uses a separate database connection.

1 3

Page 14: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

The second benchmark run adds four more threads running ReadOnly transactions:

Figure 3: Running the benchmark: Example

Creating the mirrors After the nonmirror benchmarks have been run, a fresh copy of the preloaded database is

restored, and then the database is configured for mirroring so that the same benchmark runs

can be run against a mirrored configuration. Mirrors can be created using the ServerConsole

GUI, isql SQL scripts, or programmatically using JDBC.™ The steps to configure the mirrors

with the ServerConsole are presented first, followed by a four-line SQL script that can be

used to complete the same mirror configuration task.

To start, a data source must be defined for the server running on bench1. In the screen shot

below, this data source is called webbench1.

1 4

Page 15: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Figure 4: Webbench data souce view: Example

After the data source is created, it can be connected to by right-clicking on the webbench1

data source and selecting the connect menu option.

After connecting, click on the mirror node in structure pane and select the Add Mirror menu.

1 5

Page 16: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Figure 5: Webbench mirror configuration: Example

You can see that the first mirror created is the primary. The Host, Database File name, and

Auto failover properties are set. All other properties are left with their default values. Note

that after these changes are saved, the Instant Synchronization property is always forced to

true if Auto failover is set to true. Instant Synchronization causes changes to the primary to be

sent to any read-only mirror that also has Instant Synchronization set before the transaction

commits. This is important behavior for Auto failover, because you do not want to lose any

committed transactions if the primary encounters a permanent failure.

1 6

Page 17: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Some applications prefer to control failover manually but still need transactions to be

persistent across all read-only mirrors. These applications can set Instant Synchronization

without setting Auto failover.

Although the Host property defaults to localhost, make sure to set the Host property to the

host name of the computer the server is running on. The reason for doing so is that the mirrors

need to be able to attach to each other. The only time localhost should be used is when all

mirrors are on the same computer. This can make sense for test-case scenarios but usually

does not make sense for deployment scenarios.

Editing properties in the Server console 1) Grid views in ServerConsole are typically not editable. To edit an item in the ServerConsole,

you must select it in the structure pane. This will bring up the properties for that item in the

property inspector.

2) Connection properties cannot be edited while connected. You must disconnect to edit these

properties.

3) Edit operations are “cached”. They are not applied to the underlying database(s) until the save

changes button is pressed.

Two more read-only mirrors with Auto failover set to true and one directory mirror are then

created. When this mirror configuration is saved for a database loaded with a load multiplier

of 45, it took 200 seconds to complete the creation of the mirrors. The database and log files

are well over 1 GB. They have to be copied to two read-only mirrors. The operation was

bound by network bandwidth. Roughly 11 MB per second were transmitted over a 100

megabit line.

After these four mirrors are created in the ServerConsole, the ServerConsole isql content pane

or isql command can be executed to generate the SQL statements needed to create these

mirrors. This is achieved by executing the “show ddl” command. At the end of the output

from the show ddl command the following statements are generated for the mirrors I created:

1 7

Page 18: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

CALL

DB_ADMIN.CREATE_MIRROR('NAME=Mirror1,TYPE=PRIMARY,HOST=bench1,

PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest.jds,AUTO_FAIL_O

VER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNOWN_RE

PLAY=8589934601');

CALL

DB_ADMIN.CREATE_MIRROR('NAME=Mirror2,TYPE=READONLY,HOST=bench2

,PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest_Mirror2,AUTO_F

AIL_OVER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNO

WN_REPLAY=8589934601');

CALL

DB_ADMIN.CREATE_MIRROR('NAME=Mirror3,TYPE=READONLY,HOST=bench3

,PORT=2508,FILE_NAME=/tmp/webbench/webbenchtest_Mirror3,AUTO_F

AIL_OVER=true,FAIL_OVER_PRIORITY=1,INSTANT_SYNCH=true,LAST_KNO

WN_REPLAY=8589934601');

CALL

DB_ADMIN.CREATE_MIRROR('NAME=Mirror4,TYPE=DIRECTORY,HOST=bench

4,PORT=2508,FILE_NAME=/test/webbench/webbenchtest_Mirror4,FAIL

_OVER_PRIORITY=1,LAST_KNOWN_REPLAY=9');

CALL

DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror1','ID=0,REF=Mirror1,PE

RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');

CALL

DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror2','ID=1,REF=Mirror2,PE

RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');

CALL

DB_ADMIN.CREATE_MIRROR_SCHEDULE('Mirror3','ID=2,REF=Mirror3,PE

RIOD=MILLIS,DAY=1,TIME=21:00:00,MILLIS=10000');

1 8

Page 19: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

The last three statements are interesting in that I did not create any mirror schedules for any of

the mirrors in ServerConsole. These were created automatically for all of the mirrors that have

the AUTO_FAIL_OVER or INSTANT_SYNCH property set to true. The mirror schedules

can be dropped or altered if necessary. When INSTANT_SYNCH is set, the log records for

all committed transactions are guaranteed to be durable across a majority of read-only

mirrors. However, these log records must be replayed against the read-only mirror before you

will able to see the changes when querying the read-only mirror’s database. You can manually

cause log records to be replayed by right-clicking on the mirror in ServerConsole and

selecting the Synch Mirror menu item. Note that for a mirror without INSTANT_SYNCH set

to true, this action would first bring the read-only mirror’s log files up to date with the

primary mirror and then play the log records against the read-only mirror. For

INSTANT_SYNCH mirrors, the log records are already there. They just need to be replayed

against the read-only mirror.

Mirror schedules will automatically synchronize the read-only mirrors using the time interval

you specify. For INSTANT_SYNCH mirrors, the default replay operation is scheduled for

every 10 seconds. Replaying log records against a read-only mirror is quite fast, even when

the primary database is heavily updated. Running a performance monitor on one of the read-

only mirrors used in this white paper shows that each replay takes only about one second. So

about 90% of the time, the read-only mirror is using little or none of the CPU bandwidth.

Most of the time it is just receiving sequential I/O from the primary for log records of

committed transactions.

Configuring the benchmark with mirroring After configuring all four mirrors, the webbench options must be slightly modified. The

options dialog box below shows that the server has been changed to bench4, which is where

both the webbench benchmark process and the directory mirror are located. The database to

which to connect is now the directory mirror. This causes read/write connections to be

redirected to the primary mirror on bench1. Read-only connections will be redirected to either

bench2 or bench3, which is where the read-only mirrors are located. As mentioned, the

directory mirror will use a round-robin approach to distributing read-only connection requests

across the two read-only mirrors. This provides a simple load-balancing mechanism.

1 9

Page 20: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Figure 6: Configuring the benchmark: Example

Benchmark results

Below is the output from running the benchmark with two read-only mirrors and one

directory:

Mirrored ReadWrite Threads

ReadOnly Threads

ReadWrite Tx per second

ReadOnly Tx per second

No 16 0 69.66

No 16 4 47.26 21.73

Yes 16 0 48.96

Yes 16 4 47.12 140.42

2 0

Page 21: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

Write transaction throughput is about 30% slower when mirrors are used. However, this is

still a very good transaction throughput rate for a system that keeps three database images in

synch. The mirrored configuration’s 48.96 transactions per second sustained for a single day

results in more than 4.2 million orders per day. This is a very healthy business to operate

using such a modest hardware investment.

As read-only transactions are added to the mix, the transaction throughput of the mirrored

configuration far exceeds the nonmirrored configuration. This is because the read-only

connections made to the directory mirror are automatically redirected to the read-only mirrors.

The primary mirror receives only ReadWrite transactions, and the read-only mirrors receive

all of the read-only transaction requests. The result is that the write transaction throughput is

about the same for both the mirrored and nonmirrored configuration. However, the mirrored

configuration completes almost seven times as many read-only transactions as the non-

mirrored configuration. This is the increased scalability that JDataStore HA provides. Most

applications generate far more read transactions than read/write transactions, so this is a

significant win.

It is also possible to narrow the 30% gap between mirrored and unmirrored executions of the

ReadWrite transaction. The primary reason for this gap is that when log records are replayed

on the read-only mirrors, many write operations are applied to the database file itself. With

such a heavy transaction load, this makes for many disk writes to the database file. The

problem is that to commit transactions from the primary mirror, the read-only mirrors also

must constantly commit log records to the database log files. So there is much file write

competition between the database file and the database log files. A very simple solution is to

have a disk drive for the database file and a separate disk drive for the log files. A raid-

stripping configuration is a more expensive alternative.

Another approach to narrowing this gap is to configure the database to use “soft commit”.

When the benchmark was run with soft commit, the gap narrowed from 30% to 10%. Soft

commit still guarantees that no database blocks will be written to disk before the log records

for the changes to these blocks have been durably written to disk. However, soft commit does

not make this guarantee for transaction commit operations. For transaction commit operations,

the soft commit guarantees only that the committed transactions have been written into the

2 1

Page 22: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

operating system cache. However, these writes are not guaranteed to be durable. The net

effect of soft commit is that:

1) The database will not be corrupt, because log changes are made durable before database

blocks are written.

2) All transactions will be durable as long as the operating system does not fail.

3) The most recent committed transactions might not be durable if the operating system

fails.

Soft commit might be an acceptable solution for mirrored configurations that have

AUTO_FAILOVER or INSTANT_SYNCH set to true, because committed log records are

written to all mirrors before a commit operation completes.

Scaling higher

As mentioned, the mirrored configuration can service seven times as many Read transactions

and about the same number for ReadWrite transactions when Read and ReadWrite

transactions are executed together. It is not uncommon for 70% to 90% of all transactions to

be read-only. So it makes sense that scalability could be improved by using more than just

two read-only mirrors.

Improving the hardware also could help. Some suggestions:

• If the primary mirror is CPU bound, faster CPUs, increased RAM, and SMP

configurations will help. If you add RAM to improve performance, you can increase the

JdataStore minCacheSize property. If you set the minCacheSize property, you might need

to increase the JVM heap settings, as discussed earlier. Keep in mind that if failover is

important, you will want all mirrors configured for auto failover to have similar

configurations. Otherwise, your transaction throughput will change after a failover.

2 2

Page 23: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

• In this configuration, the maximum bytes per second from the benchmark application

computer was between 1 MB and 2 MB per second. During bulk data transfers using

FTP, or when large mirrors for large databases are created, the maximum throughput of

this 100 megabit network was between 10 MB and 11 MB per second. The bulk

throughput rate is always going to be better than the throughput of the smaller-sized

packets generated from executing SQL statements of the benchmark transactions.

However, in this test run, the CPU bandwidth of the primary mirror was maximized, so

the network throughput did not appear to be an issue. However, with faster computers,

the network could be an issue. The network throughput could be improved by going to a

gigabit network or by installing an extra network interface card (NIC) in each mirror.

With two NICs, one network can be used for mirror synching and the other for

transaction requests.

• Improving the I/O system definitely will help. A separate disk drive for log files might be

an inexpensive solution. Note that the log file disk drive does not need to be large,

because unused log files are constantly dropped. A 10 GB drive is probably more than

enough. The important point is to have a separate drive that is very fast with sequential

I/O. A raid I/O system configured for stripping also would be a significant benefit.

Mirror status monitoring

The status of the mirrors can be monitored using the server console or the DB_ADMIN built-

in stored procedures. If you right-click on “Mirror Status”, you will see a grid with a row for

each mirror. If a mirror cannot be accessed, you will see an exception in the “Connection

Exception” column. The “Validated Primary” column is also important. A validated primary

is a primary that has been able to successfully attach to a majority of read-only mirrors. A

validated primary will accept write transactions. A primary mirror that is not validated will

accept only read-only transactions. You can recheck the status of these mirrors by pressing the

refresh button above the grid view. You can retrieve this status information programmatically

by calling the DB_ADMIN.GET_MIRRORS() stored procedure. There is java doc for all

DB_ADMIN stored procedures. The “Database log” node will display the contents of a

2 3

Page 24: Borland JDataStore high availabilityedn.embarcadero.com/article/images/33064/JData... · One of the most important areas of concern for any database application is eliminating single

Borland® JDataStore™ high availability

textual status log files. This shows only the most recent status log file. These status log files

are located in the same directory as the binary transaction log files.

Mirror performance monitoring

As previously discussed, the likely bottlenecks are CPU, network, and disk I/O bandwidth. On

Windows, perfmon can provide a nice high-level view of CPU, network, and physical disk

I/O activity. On Linux, the vmstat command is often a quick way to get a feel for CPU,

application I/O, and swap I/O activity.

Summary

The JDataStore HA solution provides one the simplest possible solutions to the complex

problems of fault tolerance and scalability for database applications. The system is kept

simple in part by leveraging existing subsystems for transaction management, transaction

versioning, and network I/O. The webbench sample benchmark provides a simulation of real-

world read-write and read-only transactions. The webbench results show that mirrored

configurations using modest hardware can support a throughput rate for the complex new

order transaction of more than 4.2 million transactions a day. At the same time, the three-

mirror configuration can provide more than seven times the throughput as a nonmirrored

configuration for read-only transactions.

Made in Borland® Copyright © 2005 Borland Software Corporation. All rights reserved. All Borland brand and product names are trademarks or registered trademarks of Borland Software Corporation in the United States and other countries. All other marks are the property of their respective owners. Corporate Headquarters: 100 Enterprise Way, Scotts Valley, CA 95066-3249 • 831-431-1000 • www.borland.com • Offices in: Australia, Brazil, Canada, China, Czech Republic, Finland, France, Germany, Hong Kong, Hungary, India, Ireland, Italy, Japan, Korea, Mexico, the Netherlands, New Zealand, Russia, Singapore, Spain, Sweden, Taiwan, the United Kingdom, and the United States. • 23423

2 4