Top Banner
Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772
22

Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Mar 26, 2015

Download

Documents

Evan Howard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Oracle Grid Computing: Trending for Capacity Planning

Ashish RegeSEI

Session # S307772

Page 2: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

• Database Administrators, Sys Administrators are called on everyday to make decisions on capacity planning.

– Which database has trended high for CPU, Memory, I/O etc. over the last year, month, week, day versus other databases on the same server?

– Do these trends have some unique cyclical patterns based on year, month, week, day for different applications?

– Do we see new functionality, additional users, increased concurrency with time?

– Do we have the capacity to add one or more database to existing servers?

– Where do we re-allocate databases at the next opportunity and where do we see capacity?

– Do we need to buy new servers and factor this into next year’s budget?

What DBA’s, SYS admin’s need

Page 3: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Approach and Toolset

• Top down and Bottom up approach & edge at the this sideways too ..

• SA-DBA-Middleware cross functional analysis

• Toolset that can help trend CPU/Memory/Elapsed Time– OS extended stats

– OEM repository DB/Host performance metrics

– Oracle Services stats

– App-DB-Integration-elapsed time/time-out metrics

– Apache Logs fact-dimensional model

– Batch Job-stream runtime metrics

– Business measures e.g. positions, txn counts, tax lots, fees etc

• Fill in the gaps with data points to construct a complete picture

Page 4: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Requirements summaryDatabase administrators and sys administrators need

consolidated data points to provide holistic

"Capacity Trend Analysis"

from an intraday to multiyear via a web interface for the databases and their host servers from CPU, Memory, I/O, wait bottlenecks, throughput, and efficiencies perspective.

• Stack up trends for different database per server on the same graph to enable side-by-side comparison and a better awareness of percentage usage by database/application and its alignment with the business cycle.

• Help trend uptick or downward spiral of business measures and find correlation of those with database and server statistics to model what if analysis

• end-to-end trending for

• UI response times

• Batch jobs

• Processed Business measures

Page 5: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Now to the details

• OEM grid captures at a 15 minutes interval, write PERL scripts to transfer this to a trending utility that graphs those on the web with reporting

• At the grain of intraday to multiyear for performance counters like consumption of CPU, memory, I/O, throughput, efficiency, wait stats etc. for the different databases.

• Capture metadata into flat files

(1) Details per database

(2) Comparative numbers across different databases on the same or different servers

(3) Supplement the above with details from stats pack

Page 6: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Putting it all together

OEM Repository

PERL Scripts

Flat File statistics

RRDTool graph generator

trending hourly, weekly, monthly, yearly

DBI:Oracle SQL Queires

Extract to formatted files

Feed to

Page 7: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Reporting Classifications ..• Thus the reporting that comes out of the OEM is at two levels

– Database statistics

– Host statistics, this also includes side-by-side Oracle comparison for databases on that host

• The above statistics have associated flat files extracted out of OEM to feed this data to Orca/RRDTool to generate the graphs/trend.

• Have found these graphs very useful in

1. Re-alignment decisions

2. Budget discussions around shared infrastructure resources for different cost centers

Page 8: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Orca/RRD ToolOrca/RRD Tool is a tool useful for plotting arbitrary data from text files

onto a directory on a Web server. It has the following features:1. files into the same or different plots. Creates an HTML tree of HTML and image (PNG or

GIF) files.

2. Creates an index of URL links listing all available targets.

3. Creates an index of URL links listing all different plot types.

4. No separate CGI set up required.

5. Can be run under cron or it can sleep itself waiting for file updates based on when the file was last updated.

6. Configuration file based.

7. Reads arbitrarily formatted text or binary data files.

8. Watches data files for updates and sleeps between reads.

9. Finds new files at specified times.

10. Remembers the last modification times for files so they do not have to be reread continuously.

11. Allows arbitrary grouping of data from different sources

12. Allows arbitrary math performed on data read from one file

Page 9: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Orca architecture

• Out of the box Orca statistics for the host

– On clients, orcallator.se, a component of the SE Toolkit, collects data every 5 minutes and dumps the data in orca's home directory. Orcallator.se has a startup script in /etc/init.d. This data is dumped in /home/orca/data/<hostname>

– A crontab entry for the orca user polls each client every 5 minutes, grabs the current data, and prunes old data. This is done via an SCP from orca's crontab on the server, and is driven by a list of hosts on the orca server at /apps/orca/xfer/hostlist.

–  scp -r -p -v orca@$host:/home/orca/data/* /apps/orca/orcallator 

– Orca's crontab on the orca server calls /apps/orca/xfer/pull.sh which processes all files in /apps/orca/xfer/hostlist

Page 10: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Out of box OS performance Statistics• The Orca server app which simply checks a directory tree

(/apps/orca/orcallator) every five minutes and graphs any new data which has appeared there

• Then, as orca on the orca server does an ssh to the new client ssh <client>" and when prompted, adds to known hosts, unless this is done for each client added, no data transfer can succeed.

• Then, add the client to the file /apps/orca/xfer/hostlist. This file contains a list of all clients and is used by the script 'pull.sh', driven by orca's crontab, included here. Orca's crontab also prunes some files on each client. Clients are completely passive, orcallator.se dumps performance data in orca's home directory on each client, the server collects it, processes it.

• Orca produces HTML files, along with "index.html". A standard Apache server is required to publish the pages.

• The url for viewing orca is: http://<orac server IP>/orca/

Page 11: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

OEM Custom statistics• Extract performance data out of OEM repository into flat files, via a custom

script running at an hourly frequency. These flat files are dropped into the same directory which Orca checks and processes data from to plot graphs (/apps/orca/orcallator). Thus data is generated right off the Orca server connecting remotely to the OEM repository.

• Thus any data can be plotted including business measures. It is that simple

• Using the DBI::Oracle library to query the OEM GRID Repository; joining tables like sysman.MGMT_TARGETS, sysman.MGMT_METRICS, sysman. MGMT_METRICS _1HOUR extracting data out of into files.

• A timestamp is written to a snapfile that tracks time intervals already queried and reported on; the next run is based on querying the sysman repository table to find snapshots with rollup_timestamp greater than the last run

e.g. 2008-11-12_19:00:00

Page 12: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Quarterly Instance Efficiency Group seidevdb34

HourlyDailyWeeklyMonthlyQuarterlyYearlyAll

(unix Server Name)

Reporting Time interval

Page 13: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Comparative Quarterly Instance CPU Utilization

The sample compares Waits – User CPU, User I/O, Other between different databases GWMPE02, E2L1, E2L2, EC02, ER02 across the DB server seidevdb34 E02 is the max consumer, at times taking 50% or more of CPU compare to other databases which have negligible CPU consumption.

Page 14: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

SNAPFILESnapfile helps to track last extraction time of performance data.

 pwd /export/home/oracle/orca_oracle/snapfile

 oracle@seieaas52z1# tail -f <DBNAME>.seic.com_Oem_snapid

2009-03-10_12:00:00

2009-03-10_14:00:00

2009-03-10_16:00:00

2009-03-10_18:00:00

2009-03-10_20:00:00

 oracle@seieaas52z1# tail -f <SERVERNAME>_OemHost_snapid

2009-03-10_12:00:00

2009-03-10_14:00:00

2009-03-10_16:00:00

2009-03-10_18:00:00

2009-03-10_20:00:00

 

Page 15: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

OEM SQL QUERY HOST & DB PERFORMANCE DATA

# Part 1: Decide on beginning and ending snapshots OEM repository query

"select min(to_char (d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS'))

FROM

sysman.mgmt_targets tgt

, sysman.mgmt_metrics met

, sysman.mgmt_metrics_1hour d

WHERE

lower(tgt.target_name) = lower(?)

AND tgt.target_type ='oracle_database'

AND tgt.target_guid = d.target_guid

AND met.metric_guid = d.metric_guid

AND d.rollup_timestamp > to_date(?,'YYYY-MM-DD_HH24:MI:SS')

"

 

Page 16: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

OEM SQL QUERY HOST & DB PERFORMANCE DATA

# Part 2: Get the latest snapshot and it's time

"select min(to_char ( d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS' ))

,max(to_char ( d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS' ))

FROM

sysman.mgmt_targets tgt

, sysman.mgmt_metrics met

, sysman.mgmt_metrics_1hour d

WHERE

lower(tgt.target_name) = lower(?)

AND tgt.target_type ='oracle_database'

AND tgt.target_guid = d.target_guid

AND met.metric_guid = d.metric_guid

"

Page 17: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

OEM SQL QUERY HOST & DB PERFORMANCE DATA

# Part 3: Collect the raw values from OEM for DB stats

 "SELECT DISTINCT

met.metric_column

,d.key_value

,d.value_average

:

FROM

sysman.mgmt_targets tgt

, sysman.mgmt_metrics met

, sysman.mgmt_metrics_1hour d

WHERE

lower(tgt.target_name) = lower(?) AND tgt.target_type ='oracle_database'

AND tgt.target_guid = d.target_guid AND met.metric_guid = d.metric_guid

AND d.rollup_timestamp = to_date(?,'YYYY-MM-DD_HH24:MI:SS')

ORDER BY d.rollup_timestamp,met.metric_column"

Page 18: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

OEM SQL QUERY HOST & DB PERFORMANCE DATA

# Part 4: Collect the raw values from OEM for HOSTS stats

 "SELECT DISTINCT

met.metric_column

,d.key_value

,d.value_average

. ..FROM sysman.mgmt_targets tgt, sysman.mgmt_metrics met, sysman.mgmt_metrics_1hour d

WHERE lower(tgt.target_name) = lower(?)

AND tgt.target_type ='host’ AND tgt.target_guid = d.target_guid

AND met.metric_guid = d.metric_guid AND met.metric_column IN

('cpuLoad’,'cpuLoad_15min’,'cpuLoad_1min’,'longestServ’,'cpuIOWait’,'cpuKernel'

,'cpuUser’,'cpuUtil’,'memUsedPct’,'memfreePct’,'swapUtil’,'noOfProcs’,'noOfUsers’,'totIO’,'pgScanRate’) AND

d.rollup_timestamp = to_date(?,'YYYY-MM-DD_HH24:MI:SS')

Page 19: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Orcallator.cfg OEM groupgroup oem {

find_files /apps/orca/oem/(.*)/(?:oracle)-\d{4}-\d{2}-\d{2}(?:-\d{3,})?(?:\.(?:Z|gz|bz2))?

column_description first_line

date_source column_name timestamp

interval 3600

filename_compare sub {

my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/;

my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/;

if (my $c = (( $ay <=> $by) ||

( $am <=> $bm) ||

(($ad >> 3) <=> ($bd >> 3)))) {

return 2*$c;

}

$ad <=> $bd;

}

}

 

Page 20: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Orcallator.cfg SERVER groupinggroup <SERVER1> {

find_files /apps/orca/oemhost/( SERVER1)/(?:(?: SERVER1)|(?:percol))-\d{4}-\d{2}-\d{2}-\d{3}

column_description first_line

date_source column_name timestamp

interval 3600

filename_compare sub {

my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/;

my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/;

if (my $c = (( $ay <=> $by) ||

( $am <=> $bm) ||

(($ad >> 3) <=> ($bd >> 3)))) {

return 2*$c;

}

$ad <=> $bd;

}

}

Page 21: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Services trending

Service performance in:V$SERVICE_STATSV$SERVICE_EVENTV$SERVICE_WAIT_CLASSV$SERVICEMETRICV$SERVICEMETRIC_HISTORY

Page 22: Oracle Grid Computing: Trending for Capacity Planning Ashish Rege SEI Session # S307772.

Items Learned in this Session

• Database administrators and sys administrators need consolidated data points to provide holistic "Capacity analysis" for the databases and their host servers from CPU, Memory, I/O, wait bottlenecks, throughput, and efficiencies perspective.

• This presentation outlined a methodology that helps with "Capacity Trend Analysis" from an intraday to multiyear via a web interface; the key theme here is the ability to stack up trends for different databases per server on the same graph to enable side-by-side comparison and a better awareness of percentage usage by database/application and its alignment with the business cycle.