Top Banner

of 44

RAC Capacity Planning-1

Apr 06, 2018

Download

Documents

jrshaik
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 RAC Capacity Planning-1

    1/44

  • 8/3/2019 RAC Capacity Planning-1

    2/44

    Oracle Real Application Clusters:Oracle Real Application Clusters:Oracle Real Application Clusters:Oracle Real Application Clusters: Sizing and CapacitySizing and CapacitySizing and CapacitySizing and CapacityPlanning Then and NowPlanning Then and NowPlanning Then and NowPlanning Then and Now

    Su Tang

    Sri Subramaniam

    RACPACK

  • 8/3/2019 RAC Capacity Planning-1

    3/44

    The following is intended to outline our general

    product direction. It is intended for informationpurposes only, and may not be incorporated into anycontract. It is not a commitment to deliver anymaterial, code, or functionality, and should not be

    relied upon in making purchasing decisions.The development, release, and timing of anyfeatures or functionality described for Oraclesproducts remains at the sole discretion of Oracle.

  • 8/3/2019 RAC Capacity Planning-1

    4/44

    Agenda

    Capacity Planning in GRID/RAC Environment

    Scalable Infrastructure Design On Demand Capacity Addition and Utilization

    Criteria to add more Capacity

    Real World Customer Example

    Questions

  • 8/3/2019 RAC Capacity Planning-1

    5/44

    Capacity Planning

  • 8/3/2019 RAC Capacity Planning-1

    6/44

    RAC Capacity PlanningAdvantages

    All current practices still apply

    Network Storage sizing Interconnect Network capacity

    Servers capacity

    Application Service design

    RAC flexibility ensures Good initial estimate is sufficient

    Easily accommodates Growth

    Emphasis shifts to capacity utilization

  • 8/3/2019 RAC Capacity Planning-1

    7/44

    Storage Network

  • 8/3/2019 RAC Capacity Planning-1

    8/44

    Networked Storage

    RAC works with both SAN and NAS Storage

    Optimal Storage selection depends on .. Estimated I/O Response Time Typically single block I/O requests

    Common characteristic of most OLTP applications

    IOPS measure used Estimated I/O Bandwidth

    Large multi-block I/Os

    Data Warehouse and Mix workload environments

    Occurs during backup/recovery operations

    Estimation should include requirements for bothnormal/backup I/Os

  • 8/3/2019 RAC Capacity Planning-1

    9/44

    Storage Capacity Planning

    Estimate initial data size and growth rate for all the applications

    (E.g., 500GB initial, double over two years, 1TB total)

    Add the fault tolerance requirements

    (E.g., 2TB with RAID1, 1.2TB with RAID5)

    Add the backup requirements to the size

    (E.g., Additional 1TB for a full, another 1TB for 5 incremental)

  • 8/3/2019 RAC Capacity Planning-1

    10/44

    Storage Capacity Planning

    Estimate aggregated throughput and IOPS

    (E.g., 2GB/sec, or 300,000 IOPS)

    Calculate the total bandwidth requirement per node(E.g., 2GB/sec for 16 nodes = 128MB/node/sec or 300,000/16 = 18,750 IOPS/node)

    Choose the appropriate storage class and build the configuration(E.g., 1,200 IOPS per spindle, 16-way striped = 19,200 IOPS per LUN)

  • 8/3/2019 RAC Capacity Planning-1

    11/44

    Interconnect Network

  • 8/3/2019 RAC Capacity Planning-1

    12/44

    Interconnect Capacity Planning

    RAC interconnect usage

    Oracle Clusterware Very small messages exchanged periodically

    Response time/load critical not big bandwidth consumer

    Oracle RAC Database

    Primary user of interconnect capacity Exchanges both small and large messages between

    nodes

    Key driver in deciding the network configuration

  • 8/3/2019 RAC Capacity Planning-1

    13/44

    RAC Messages

    Small 256 byte messages

    Used by GES and GCS Cache Fusion blocks messages

    Db_block_size

    Parallel Query

    Parallel_execution_message_size

    default 8k

  • 8/3/2019 RAC Capacity Planning-1

    14/44

    Interconnect Bandwidth

    Message received (M) per second

    (#GES message + #GCS messages)

    Blocks received (B) per second

    (db_block_size * (#cr block received + #current block received)) /

    mtu size

    PQ message received (P) per second

    (PQ_message_size * # PX remote messages recv'd ) / mtu size

    Total bandwidth required per second

    (Message received + Blocks received + PQ message received) /max network transmit capacity

    (M+B+P)/85000

    Similar equation applies to send side

  • 8/3/2019 RAC Capacity Planning-1

    15/44

    Example from AWR Report

    Global Cache blocks received: 2,534

    GCS/GES messages received: 8,11

    PX remote messages recv'd 65

    Db_block_size 8192

    Parallel_execution_message_size 8192

    Mtu_size 1500

    One Gigabit ethernet interface for interconnect

    Total bandwidth Reqd= (M+B+P)/85000 = (2534 + ((811 *8192)/1500) + ((65*8192)/1500) )/85000

    8.5 % of capacity utilization

  • 8/3/2019 RAC Capacity Planning-1

    16/44

    Interconnect Bandwidth

    Available Interconnect Bandwidth in IP based network

    Depends on the network packets transmitted The comparison of theoretical bandwidth using total bytes

    transmitted is not accurate

  • 8/3/2019 RAC Capacity Planning-1

    17/44

    Available Network Bandwidth

    0

    20

    40

    60

    80

    100

    120

    256 byte 512 byte 1024 byte 2048 byte 8192 byte

    Series1

    MB/sec

    Message size in bytes

  • 8/3/2019 RAC Capacity Planning-1

    18/44

    RAC Interconnect

    Experience shows for most applications single Gigabit

    Ethernet is adequate In planning 70 % utilization should be reasonablepoint to add additional interfaces

  • 8/3/2019 RAC Capacity Planning-1

    19/44

    Server Capacity

  • 8/3/2019 RAC Capacity Planning-1

    20/44

    Server Capacity Planning

    To size the server optimally

    Consider total no of concurrent processes

    Estimated CPU utilization of critical queries

    Grid control/ SQL Trace should give this data

    Plan for max run-queue length 2 * no of CPUs

    During high utilization periods never to exceed 70% overallCPU in the box

    Factor the percentage of capacity each server adds

    This would help to attain your High Availability Goals

    In planned outage situations it will help to

    Determine whether surviving nodes can support theworkload

  • 8/3/2019 RAC Capacity Planning-1

    21/44

    Server capacity Planning

    Ensure optimal no of HBAs are available

    To get desired I/O response time & bandwidth

    Plan for 50-70% Capacity utilization

    Ensure optimal number of NICs avaiable

    For both public and cluster interconnects

    And for NAS Storage if used

  • 8/3/2019 RAC Capacity Planning-1

    22/44

    Infrastructure Design

  • 8/3/2019 RAC Capacity Planning-1

    23/44

    Scalable Infrastructure Design

    Very critical aspect in new capacity planning exercise

    Critical elements of scalable infrastructure designconsist of

    Networked Storage

    Interconnect Network

    Optimally sized servers Software and Application Service

  • 8/3/2019 RAC Capacity Planning-1

    24/44

    Infrastructure Design

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    2 SAN Switches

    Low-end SAN Storage

    2 ports from each Storage Processor connected to eachSAN switch

    Equal-size RAID5 LUNS are distributed among all SPs

    On Storage Processor failure in Array LUNs would failover

  • 8/3/2019 RAC Capacity Planning-1

    25/44

    Infrastructure Design

    S

    erverFarm

    a001 a002 a003 aNNNb001 b002 b003 bNNN

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    Server and storagefarms horizontallyscalable (scaling-

    out)

    2 CPU and 4 CPU boxes

    2 port HBA connecting to each server

    LUNS are load-balanced on both ports

    Protects from SP, Array port, Single HBA, Single SAN switch

  • 8/3/2019 RAC Capacity Planning-1

    26/44

    Infrastructure Design

    S

    erverFarm

    a001 a002 a003 aNNNb001 b002 b003 bNNN

    IPNetwork Public/App-DB Private Interconnect NAS/iSCSI Management

    NAS NNLANWAN

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    Storage

    Farm

    SAN Fabric 1 SAN Fabric 2

    Storage 01 Storage 02 Storage NN

    Server and storagefarms horizontallyscalable (scaling-

    out)

  • 8/3/2019 RAC Capacity Planning-1

    27/44

    Infrastructure Design

    Separate Switches for PUBLIC, Private, NAS if usedand Management Network

    Redundant Networks for PUBLIC, PRIVATE and NAS

    - For most configurations active/failover should be sufficient

    - Where Load-balancing used ensure correct option of Network

    Redundancy is used to provide both send and Receive sideload balance

    - 803.2ad is used to aggregate switch ports

    - 803.2ad is used in the host to bond the interfaces

  • 8/3/2019 RAC Capacity Planning-1

    28/44

    Storage Network

    Implement zoning / masking using

    Simple scheme where all LUNs are visible across all nodes,if the cluster infrastructure is used by multiple databases

    Create equi-sized LUNS that meets planned I/Ocharacteristics

    Ensure LUN can support combined throughput of allconcurrent RAC node access

    Avoid ISL in SAN switch design by sizing the SANswitch appropriately

    In ASM diskgroup add disks with similar storagecharacteristics and capacity

  • 8/3/2019 RAC Capacity Planning-1

    29/44

    Interconnect Network

    Ensure proper VLAN for the cluster-interconnectnetwork

    Avoid cascading switches

    If NIC bonding used ensure switch ports areappropriately configured to provide both send/receive

    side load balancing Ensure similar vendors NICs are teamed in the host

  • 8/3/2019 RAC Capacity Planning-1

    30/44

    Server Design

    Ensure similar sized servers are clustered together

    Ensure Remote Administration has been correctlysetup

    Use Automated procedures to check consistency ofcorrect OS, firmware and application software versionand revision levels Cluster Verification Tool

    Verifies infrastructure,Clusterware and RAC configurations

    ORION

    Measures available I/O bandwidth and Response Time

    IPERF

    Measures & reports network performance

  • 8/3/2019 RAC Capacity Planning-1

    31/44

    Software Considerations

  • 8/3/2019 RAC Capacity Planning-1

    32/44

    Cluster Software Design

    If multiple Databases are using common clusterinfrastructure

    Ensure similar sized nodes are clustered together

    Install separate single CLUTER_HOME

    Install separate single ASM_HOME

    DB_HOMEs could be installed/expanded as required

  • 8/3/2019 RAC Capacity Planning-1

    33/44

    Adding Capacity

  • 8/3/2019 RAC Capacity Planning-1

    34/44

    When to Add More Capacity

    These Guidelines assumes

    All configuration and Best Practices are followed

    And all necessary SQL, DB tuning is performed

    Key threshold to monitor for disk I/O

    Db_file_sequential_read > 25 msec

    Db_file_scattered_read > 30 msec Log_file_parallel_write > 3 msec

    Determine the source of the bottleneck

    HOST, HBA, SAN Switch or Storage Array

  • 8/3/2019 RAC Capacity Planning-1

    35/44

    When to Add More Capacity

    Thresholds to monitor Interconnect Network

    Assumes following pre-requisites

    Host CPUs in any RAC instance node is not maxed out

    Correct Network Configuration and Best Practice followed

    Log_file_parallel_write not > 3 msec

    If cache fusion message latencies exceed following limitations

    3080.3Avg global cache current block receive time(ms)

    2330.1Average time to process current block request1240.3Avg global cache cr block receive time (ms)

    1010.1Average time to process cr block request

    UpperBound

    TypicalLowerBound

    AWR Report Latency Name

  • 8/3/2019 RAC Capacity Planning-1

    36/44

    AWR Report RAC Statistics

  • 8/3/2019 RAC Capacity Planning-1

    37/44

    When to Add Capacity

    Server

    Overall CPU utilization constantly exceed 70%

    Run-queue length is > 2*CPU for long periods of time

  • 8/3/2019 RAC Capacity Planning-1

    38/44

    Real World Example

  • 8/3/2019 RAC Capacity Planning-1

    39/44

    Mercado Libre

    eBay in Latin America

    Runs marketplace from search to Bid

    In 2004 moved from mid-range SMP to 4*4 node Itanium2 Linux RAC Cluster

    16 Gig RAM each Node

    NFS filer storage

    Initially estimated 400,000 TP hour good for 2 years

  • 8/3/2019 RAC Capacity Planning-1

    40/44

    Mercado Libre

    Scaled incrementally as marketplace grew

    0

    200,000

    400,000

    600,000

    800,000

    1,000,000

    1,200,000

    1,400,000

    1,600,000

    BusinessVo

    lu

    2004 2005 2006

    Nodes

  • 8/3/2019 RAC Capacity Planning-1

    41/44

    Mercado LibrePerformance Characteristics

    MercadoLibres 13 node Linux Itanium cluster

    460 GB RAM clusterwide

    286 GB SGA

    14,500 URLS/second

    47 GB/ redo /day

    Only use a maximum 40% of the capacity of a single Gigabit

    Ethernet interconnect

  • 8/3/2019 RAC Capacity Planning-1

    42/44

    Summary

    Plan initial sizing with good estimate

    Design a Scalable infrastructure

    Grow capacity with business volume

    Resource utilization is the key driver

  • 8/3/2019 RAC Capacity Planning-1

    43/44

    For More Information

    http://search.oracle.com

    or

    otn.oracle.com/rac

    REAL APPLICATION CLUSTERS

  • 8/3/2019 RAC Capacity Planning-1

    44/44