Agenda

Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC

Grant McAlisterSenior Database EngineerAmazon.com

Paper 32110

Agenda

Why Oracle on Linux and RACThe Tests ScalingPerformanceAvailabilityChoice of InterconnectConclusion

Why Linux

Lower Total Cost of Ownership Near commodity hardware and support Multiple O/S and hardware vendors Common platform (IA-32) for entire enterprise

Unix look and feel New enterprise kernel No database conversions when changing

Linux hardware or O/S

Why RAC on LinuxCost

Ability to use near commodity systems (2-4 processors)Lower level of support needed on system units

The need for availabilityYoung and rapidly evolving O/S Near commodity hardware and support

The need to scale database beyond 8 processorsThe need for large amounts of memory > 32GBytes

The Tests

Real life workloadsNot modified or partitioned to support RACUsed automatic space management

Workload #1 Simple workload of small queries with little locking.

Workload #2Typical nasty workload with many inserts, updates and select

for updates causing a lot of locking and blocking.

Workload #1 Single Instance ProfileLoad Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 77,516.28 1,460.05 Logical reads: 4,134.57 77.88 Block changes: 462.54 8.71 Physical reads: 155.70 2.93 Physical writes: 27.14 0.51 User calls: 11,012.73 207.43 Parses: 432.50 8.15 Sorts: 187.32 3.53 Executes: 432.89 8.15 Transactions: 53.09 % Blocks changed per Read: 11.19 Recursive Call %: 0.68 Rollback per transaction %: 0.82 Rows per Sort: 353.26

Top 5 Wait Events on a single instance Avg Total Wait wait WaitsEvent Waits Timeouts Time (s) (ms) /txn---------------------------- ------------ ---------- ---------- ------ --------db file sequential read 560,060 0 1,249 2 2.9log file sync 180,813 494 676 4 0.9log file parallel write 188,017 181,946 143 1 1.0latch free 87,584 6,309 141 2 0.5db file parallel write 5,794 2,895 14 2 0.0

Workload #2 Single Instance ProfileLoad Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 244,988.60 5,306.31 Logical reads: 14,562.36 315.41 Block changes: 1,802.47 39.04 Physical reads: 319.45 6.92 Physical writes: 91.52 1.98 User calls: 2,877.06 62.32 Parses: 457.06 9.90 Sorts: 290.13 6.28 Executes: 456.73 9.89 Transactions: 46.17 % Blocks changed per Read: 12.38 Recursive Call %: 4.16 Rollback per transaction %: 0.96 Rows per Sort: 13.09Top 5 wait events on a single instance Avg Total Wait wait WaitsEvent Waits Timeouts Time (s) (ms) /txn---------------------------- ------------ ---------- ---------- ------ --------db file sequential read 346,048 0 1,412 4 1.7enqueue 177 119 369 2087 0.0free buffer waits 752 32 348 463 0.0db file scattered read 141,564 0 325 2 0.7log file sync 207,109 37 306 1 1.0

The Hardware and Software

SoftwareOracle 9.2.0.1 Red Hat Advanced Server 2.1 (2.4.9-e.3)

Hardware3 types of clusters that each have 4 nodes

2 Pentium III Xeon Processors @ 1.126GHz & 5 Gbytes of RAM2 Pentium 4 Xeon DP Processors @ 2.4GHz & 4 Gbytes of RAM4 Pentium 4 Xeon MP Processors @ 1.6GHz & 10 Gbytes of RAM

Database files were on raw partitions

Scaling

The ability to produce higher transactional volumes when adding additional processors or additional nodes.

Scaling of workload #1

Scaling of Workload #1

1.0

1.9

2.6

3.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Single Instance Two Nodes Three Nodes Four Nodes

Tra

nsa

ctio

nal

Vo

lum

e 2 Proc - 1.126GHz

Scaling of Workload #1

1.0

1.9

2.6

3.6

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0


Tra

nsa

ctio

nal

Vo

lum

e

2 Proc - 1.126GHz

Scaling of workload #2Scaling of Workload #2

1.0

2.11.8

3.0

3.9

4.9

2.7

1.6

0

1

2

3

4

5

6


Tra

nsa

ctio

nal

Vo

lum

e

2 Proc - 2.4GHz 4 Proc - 1.6GHz

Some workloads scale better

Scaling of Different Workloads

1.0

1.9

2.6

3.6

1.0

1.6

2.1

2.7

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0


Tra

ns

ac

tio

na

l Vo

lum

e

Workload #1 Workload #2

Some of the differences

Event Waits Time (s) %Total Elapsed Time

CPU time 2,386 33.15

global cache null to x 62,646 2,067 28.71

db file sequential read 391,474 1,063 14.76

buffer busy global cache 15,125 560 7.78

log file sync 158,560 347 4.82

Top 5 workload #1 timed events

Event Waits Time (s) %Total Elapsed Time

global cache cr request 1,324,756 19,080 27.28

buffer busy global cache 53,411 11,531 16.49

enqueue 38,795 11,084 15.85

global cache null to x 88,908 6,449 9.22

CPU time 5,085 7.27

Top 5 workload #2 timed events

Performance

The time taken to perform a query is importantExecution time influences transactional volumeCan cause dramatic changes in the end user response time

Stock ExchangeInternet RetailerBank

Only you know what is reasonable for your database and application

Execution times for workload #1 Execution Times for Workload #1

2 Processors @ 2.4 GHz

44% 48%59%

109%

52% 49%58%

193%

53% 48%61%

312%

0%

50%

100%

150%

200%

250%

300%

350%

Update Select Select for Update Insert

Per

cen

t In

crea

se

Two Nodes Three Nodes Four Nodes

Execution times for workload #2

Execution Times for Workload #24 Processors @1.6GHz

94%

17%38%

233%

59%

316%

147% 138%

34%

75% 84%

134%

0%

50%

100%

150%

200%

250%

300%

350%

Update Select Select for Update Insert

Per

cen

t In

crea

se

Two Nodes Three Nodes Four Nodes

Some ways to improve

Make sure you database is well tuned for single instance operation

Consider using different block sizes for hot indexes

Hash partition hot tables and indexesPartition the workload

Availability

Minimize failures by building clusters with as few single points of failure as possible.

Setup your RAC cluster to recover from node and instance failure as quickly as possible.

Redundant RAC Configuration

Instance recovery time

MTTR Target=120 MTTR Target=240 MTTR Target Not Set

Cluster Reconfigured

2 2 2

Recovery Started 9 10 12

Redo Log First Pass

1 1 13

Redo Log Second Pass

23 56 329

Total Time 35 69 356

fast_start_mttr_target is the key

Node failure recovery timeRecovery Time= Failure detection + Instance recoveryFailure detection = (MissCount * 1 second) MissCount parameter in found in cmcfg.ora

When MissCount = 20 and fast_start_mttr_target=120All workload #2 processing resumed in less than 1

minute after crashing a node.

Impact of a single node failure

0

500

1000

1500

2000

2500

3000

-20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85

Seconds

Tra

ns

actio

ns

per

Sec

ond

Node failed Cluster Reconfigured

CM ejects node Recovery Complete

Choice of Interconnect

1000Mbit (Gigabit) EthernetLatency ~ 0.07 msTransfer Rate - 30+ MBytes per secondMore expensive but becoming common with the

advent of gigabit over copper.

100Mbit EthernetLatency ~ 0.20 msTransfer Rate - 10 MBytes per second Common and inexpensive

100mbit vs. Gigabit

Oracle Interconnect Latency

15.5

144.3

10.5

108.4

0

20

40

60

80

100

120

140

160

Aveverage receive time for CRblock

Aveverage receive time for currentblock

ms

100Mbit Gigabit

Conclusions

RAC scaled at 90% on a simple workloadRAC scaled consistently at 55+% on a

complex workloadThere is an impact to query performance

depending on your workloadYou can recover from failures in less than 1

minuteWhen configured correctly a RAC cluster can

scale, perform and be highly available.

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

Agenda

Documents

Agenda