8/3/2019 RAC Capacity Planning-1
1/44
8/3/2019 RAC Capacity Planning-1
2/44
Oracle Real Application Clusters:Oracle Real Application Clusters:Oracle Real Application Clusters:Oracle Real Application Clusters: Sizing and CapacitySizing and CapacitySizing and CapacitySizing and CapacityPlanning Then and NowPlanning Then and NowPlanning Then and NowPlanning Then and Now
Su Tang
Sri Subramaniam
RACPACK
8/3/2019 RAC Capacity Planning-1
3/44
The following is intended to outline our general
product direction. It is intended for informationpurposes only, and may not be incorporated into anycontract. It is not a commitment to deliver anymaterial, code, or functionality, and should not be
relied upon in making purchasing decisions.The development, release, and timing of anyfeatures or functionality described for Oraclesproducts remains at the sole discretion of Oracle.
8/3/2019 RAC Capacity Planning-1
4/44
Agenda
Capacity Planning in GRID/RAC Environment
Scalable Infrastructure Design On Demand Capacity Addition and Utilization
Criteria to add more Capacity
Real World Customer Example
Questions
8/3/2019 RAC Capacity Planning-1
5/44
Capacity Planning
8/3/2019 RAC Capacity Planning-1
6/44
RAC Capacity PlanningAdvantages
All current practices still apply
Network Storage sizing Interconnect Network capacity
Servers capacity
Application Service design
RAC flexibility ensures Good initial estimate is sufficient
Easily accommodates Growth
Emphasis shifts to capacity utilization
8/3/2019 RAC Capacity Planning-1
7/44
Storage Network
8/3/2019 RAC Capacity Planning-1
8/44
Networked Storage
RAC works with both SAN and NAS Storage
Optimal Storage selection depends on .. Estimated I/O Response Time Typically single block I/O requests
Common characteristic of most OLTP applications
IOPS measure used Estimated I/O Bandwidth
Large multi-block I/Os
Data Warehouse and Mix workload environments
Occurs during backup/recovery operations
Estimation should include requirements for bothnormal/backup I/Os
8/3/2019 RAC Capacity Planning-1
9/44
Storage Capacity Planning
Estimate initial data size and growth rate for all the applications
(E.g., 500GB initial, double over two years, 1TB total)
Add the fault tolerance requirements
(E.g., 2TB with RAID1, 1.2TB with RAID5)
Add the backup requirements to the size
(E.g., Additional 1TB for a full, another 1TB for 5 incremental)
8/3/2019 RAC Capacity Planning-1
10/44
Storage Capacity Planning
Estimate aggregated throughput and IOPS
(E.g., 2GB/sec, or 300,000 IOPS)
Calculate the total bandwidth requirement per node(E.g., 2GB/sec for 16 nodes = 128MB/node/sec or 300,000/16 = 18,750 IOPS/node)
Choose the appropriate storage class and build the configuration(E.g., 1,200 IOPS per spindle, 16-way striped = 19,200 IOPS per LUN)
8/3/2019 RAC Capacity Planning-1
11/44
Interconnect Network
8/3/2019 RAC Capacity Planning-1
12/44
Interconnect Capacity Planning
RAC interconnect usage
Oracle Clusterware Very small messages exchanged periodically
Response time/load critical not big bandwidth consumer
Oracle RAC Database
Primary user of interconnect capacity Exchanges both small and large messages between
nodes
Key driver in deciding the network configuration
8/3/2019 RAC Capacity Planning-1
13/44
RAC Messages
Small 256 byte messages
Used by GES and GCS Cache Fusion blocks messages
Db_block_size
Parallel Query
Parallel_execution_message_size
default 8k
8/3/2019 RAC Capacity Planning-1
14/44
Interconnect Bandwidth
Message received (M) per second
(#GES message + #GCS messages)
Blocks received (B) per second
(db_block_size * (#cr block received + #current block received)) /
mtu size
PQ message received (P) per second
(PQ_message_size * # PX remote messages recv'd ) / mtu size
Total bandwidth required per second
(Message received + Blocks received + PQ message received) /max network transmit capacity
(M+B+P)/85000
Similar equation applies to send side
8/3/2019 RAC Capacity Planning-1
15/44
Example from AWR Report
Global Cache blocks received: 2,534
GCS/GES messages received: 8,11
PX remote messages recv'd 65
Db_block_size 8192
Parallel_execution_message_size 8192
Mtu_size 1500
One Gigabit ethernet interface for interconnect
Total bandwidth Reqd= (M+B+P)/85000 = (2534 + ((811 *8192)/1500) + ((65*8192)/1500) )/85000
8.5 % of capacity utilization
8/3/2019 RAC Capacity Planning-1
16/44
Interconnect Bandwidth
Available Interconnect Bandwidth in IP based network
Depends on the network packets transmitted The comparison of theoretical bandwidth using total bytes
transmitted is not accurate
8/3/2019 RAC Capacity Planning-1
17/44
Available Network Bandwidth
0
20
40
60
80
100
120
256 byte 512 byte 1024 byte 2048 byte 8192 byte
Series1
MB/sec
Message size in bytes
8/3/2019 RAC Capacity Planning-1
18/44
RAC Interconnect
Experience shows for most applications single Gigabit
Ethernet is adequate In planning 70 % utilization should be reasonablepoint to add additional interfaces
8/3/2019 RAC Capacity Planning-1
19/44
Server Capacity
8/3/2019 RAC Capacity Planning-1
20/44
Server Capacity Planning
To size the server optimally
Consider total no of concurrent processes
Estimated CPU utilization of critical queries
Grid control/ SQL Trace should give this data
Plan for max run-queue length 2 * no of CPUs
During high utilization periods never to exceed 70% overallCPU in the box
Factor the percentage of capacity each server adds
This would help to attain your High Availability Goals
In planned outage situations it will help to
Determine whether surviving nodes can support theworkload
8/3/2019 RAC Capacity Planning-1
21/44
Server capacity Planning
Ensure optimal no of HBAs are available
To get desired I/O response time & bandwidth
Plan for 50-70% Capacity utilization
Ensure optimal number of NICs avaiable
For both public and cluster interconnects
And for NAS Storage if used
8/3/2019 RAC Capacity Planning-1
22/44
Infrastructure Design
8/3/2019 RAC Capacity Planning-1
23/44
Scalable Infrastructure Design
Very critical aspect in new capacity planning exercise
Critical elements of scalable infrastructure designconsist of
Networked Storage
Interconnect Network
Optimally sized servers Software and Application Service
8/3/2019 RAC Capacity Planning-1
24/44
Infrastructure Design
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
2 SAN Switches
Low-end SAN Storage
2 ports from each Storage Processor connected to eachSAN switch
Equal-size RAID5 LUNS are distributed among all SPs
On Storage Processor failure in Array LUNs would failover
8/3/2019 RAC Capacity Planning-1
25/44
Infrastructure Design
S
erverFarm
a001 a002 a003 aNNNb001 b002 b003 bNNN
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
Server and storagefarms horizontallyscalable (scaling-
out)
2 CPU and 4 CPU boxes
2 port HBA connecting to each server
LUNS are load-balanced on both ports
Protects from SP, Array port, Single HBA, Single SAN switch
8/3/2019 RAC Capacity Planning-1
26/44
Infrastructure Design
S
erverFarm
a001 a002 a003 aNNNb001 b002 b003 bNNN
IPNetwork Public/App-DB Private Interconnect NAS/iSCSI Management
NAS NNLANWAN
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
Storage
Farm
SAN Fabric 1 SAN Fabric 2
Storage 01 Storage 02 Storage NN
Server and storagefarms horizontallyscalable (scaling-
out)
8/3/2019 RAC Capacity Planning-1
27/44
Infrastructure Design
Separate Switches for PUBLIC, Private, NAS if usedand Management Network
Redundant Networks for PUBLIC, PRIVATE and NAS
- For most configurations active/failover should be sufficient
- Where Load-balancing used ensure correct option of Network
Redundancy is used to provide both send and Receive sideload balance
- 803.2ad is used to aggregate switch ports
- 803.2ad is used in the host to bond the interfaces
8/3/2019 RAC Capacity Planning-1
28/44
Storage Network
Implement zoning / masking using
Simple scheme where all LUNs are visible across all nodes,if the cluster infrastructure is used by multiple databases
Create equi-sized LUNS that meets planned I/Ocharacteristics
Ensure LUN can support combined throughput of allconcurrent RAC node access
Avoid ISL in SAN switch design by sizing the SANswitch appropriately
In ASM diskgroup add disks with similar storagecharacteristics and capacity
8/3/2019 RAC Capacity Planning-1
29/44
Interconnect Network
Ensure proper VLAN for the cluster-interconnectnetwork
Avoid cascading switches
If NIC bonding used ensure switch ports areappropriately configured to provide both send/receive
side load balancing Ensure similar vendors NICs are teamed in the host
8/3/2019 RAC Capacity Planning-1
30/44
Server Design
Ensure similar sized servers are clustered together
Ensure Remote Administration has been correctlysetup
Use Automated procedures to check consistency ofcorrect OS, firmware and application software versionand revision levels Cluster Verification Tool
Verifies infrastructure,Clusterware and RAC configurations
ORION
Measures available I/O bandwidth and Response Time
IPERF
Measures & reports network performance
8/3/2019 RAC Capacity Planning-1
31/44
Software Considerations
8/3/2019 RAC Capacity Planning-1
32/44
Cluster Software Design
If multiple Databases are using common clusterinfrastructure
Ensure similar sized nodes are clustered together
Install separate single CLUTER_HOME
Install separate single ASM_HOME
DB_HOMEs could be installed/expanded as required
8/3/2019 RAC Capacity Planning-1
33/44
Adding Capacity
8/3/2019 RAC Capacity Planning-1
34/44
When to Add More Capacity
These Guidelines assumes
All configuration and Best Practices are followed
And all necessary SQL, DB tuning is performed
Key threshold to monitor for disk I/O
Db_file_sequential_read > 25 msec
Db_file_scattered_read > 30 msec Log_file_parallel_write > 3 msec
Determine the source of the bottleneck
HOST, HBA, SAN Switch or Storage Array
8/3/2019 RAC Capacity Planning-1
35/44
When to Add More Capacity
Thresholds to monitor Interconnect Network
Assumes following pre-requisites
Host CPUs in any RAC instance node is not maxed out
Correct Network Configuration and Best Practice followed
Log_file_parallel_write not > 3 msec
If cache fusion message latencies exceed following limitations
3080.3Avg global cache current block receive time(ms)
2330.1Average time to process current block request1240.3Avg global cache cr block receive time (ms)
1010.1Average time to process cr block request
UpperBound
TypicalLowerBound
AWR Report Latency Name
8/3/2019 RAC Capacity Planning-1
36/44
AWR Report RAC Statistics
8/3/2019 RAC Capacity Planning-1
37/44
When to Add Capacity
Server
Overall CPU utilization constantly exceed 70%
Run-queue length is > 2*CPU for long periods of time
8/3/2019 RAC Capacity Planning-1
38/44
Real World Example
8/3/2019 RAC Capacity Planning-1
39/44
Mercado Libre
eBay in Latin America
Runs marketplace from search to Bid
In 2004 moved from mid-range SMP to 4*4 node Itanium2 Linux RAC Cluster
16 Gig RAM each Node
NFS filer storage
Initially estimated 400,000 TP hour good for 2 years
8/3/2019 RAC Capacity Planning-1
40/44
Mercado Libre
Scaled incrementally as marketplace grew
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
BusinessVo
lu
2004 2005 2006
Nodes
8/3/2019 RAC Capacity Planning-1
41/44
Mercado LibrePerformance Characteristics
MercadoLibres 13 node Linux Itanium cluster
460 GB RAM clusterwide
286 GB SGA
14,500 URLS/second
47 GB/ redo /day
Only use a maximum 40% of the capacity of a single Gigabit
Ethernet interconnect
8/3/2019 RAC Capacity Planning-1
42/44
Summary
Plan initial sizing with good estimate
Design a Scalable infrastructure
Grow capacity with business volume
Resource utilization is the key driver
8/3/2019 RAC Capacity Planning-1
43/44
For More Information
http://search.oracle.com
or
otn.oracle.com/rac
REAL APPLICATION CLUSTERS
8/3/2019 RAC Capacity Planning-1
44/44