Top Banner
DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar
60
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

DOUG – Technology Day

Tuning Seminar by Nitin Vengurlekar

Page 2: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Nitin Vengurlekar

• 18 Years with Oracle

6 years with Oracle Support

9 years with RAC Product Management

3 years a “Private Database Cloud” Evangelist• Worked with numerous customers on consolidation, rationalization

planning.  • Taking these key customers to reference-ability• Developed white papers and Best Practices for Application/Database

High Availability and Consolidation

• Follow me on Twitter: dbcloudshifu

2

Page 3: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

RAC FUNDAMENTALS AND INFRASTRUCTURE SUPPORT

Page 4: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Content ContributorsThanks to all the past Content Contributors

• Michael Zoll• Markus Michalewicz• Barb Lundhild• Saar Moaz• John McHugh• My “former past” Nitin Vengurlekar (circa 2009)

Page 5: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Objectives

• Not gonna give you scripts or queries – You can find that on the InterWeb

• Gonna cover basics of buffer/block management in RAC– So you know what is happening when happens

• Review key metrics/waits and its dependencies– So you know [starting point] causality

• Check out next session on RAC Buffer Cache Internals for more deep dive

Page 6: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

AgendaUnderstanding RAC Cache Fusion for Practical RAC Performance Analysis

• RAC Fundamentals and Infrastructure • Common Problems and Symptoms• Application and Database Design• Diagnostics and Problem Determination • Summary: Practical Performance Analysis

Page 7: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Service

RAC Cluster 11gR2 Architecture public network

Node1

Operating System

Oracle Clusterware

DB Instance 1

ASM

VIP1

ListenerNode 2

Operating System

Oracle Clusterware

DB Instance 2

ASM

VIP2

ListenerService Node n

Operating System

Oracle Clusterware

DB Instance n

ASM

VIPn

ListenerService

/…/

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

Managed by ASM

Page 8: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Under the Covers

Redo Log Files

Node nNode 2

Data Files and Control Files

Redo Log Files Redo Log Files

DictionaryCache

Log buffer

LCK0 LGWR DBW0

SMON PMON

LibraryCache

Global Resoruce Directory

LMS0

Instance 2SGA

Instance n

Cluster Private High Speed Network

Buffer Cache

LMON LMD0 DIAG

Dictionary Cache

Log buffer

LCK0 LGWR DBW0

SMON PMON

Library Cache

Global Resoruce Directory

LMS0

Buffer Cache

LMON LMD0 DIAG

Dictionary Cache

Log buffer

LCK0 LGWR DBW0

SMON PMON

Library Cache

Global Resoruce Directory

LMS0

Buffer Cache

LMON LMD0 DIAG

Instance 1

Node 1

SGA SGA

LMHB LMHB LMHB

Page 9: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Two Keys Components – Cache Fusion

• Global Cache Service (GCS)

• Global Enqueue Services (GES)

• Global Resource Directory (GRD)

Page 10: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Enqueue Service (GES)

• GES maintains synchronization dictionary cache, library cache, transaction locks, and DDL locks.– Easier just to say “GES manages enqueues other than data blocks” – LCK and LMD are processes that manage GES

• Maintains local and global enqueue – V$ENQUEUE_STATISTICS displays enqueues with the highest impact.– GV$LOCK – global view of local locks– GV$GES_ENQUEUES – global view of global locks that are blocking or being blocked– TX, TM, SQ, TA, US are typical enqueues

Page 11: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Enqueue Service (GES) - Example

• A process is trying to acquire a HW enqueue• Sends a BAST (Blocking Asynchronous Trap) message to LCK process• LCK constructs the message

– Message includes lock pointer, resource pointer, and resource name

• If resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.– Can be seen as ‘DFS lock handle’ waits

Page 12: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Cache Services (GCS)

• Guarantees cache coherency– Ensures that instances acquire a resource cluster-wide before modifying or reading a

database block

• Minimizes access time to data which is not in local cache and would otherwise be read from disk or rolled back– Synchronize global cache access – PCM ;-)

• Implements direct memory access over interconnect• Uses an efficient and scalable messaging protocol

– skgxp

Page 13: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Resource Directory (GRD)

• GRD records information about current status of the data blocks, resources and enqueues.

• GRD is managed and maintained by GES and GCS.• Each running instance stores a portion of the directory. • LMON recovers the GRD during instance recovery

Page 14: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Cache Resource Relationship

8k on-disk block

8k buffer header (x$bh)200 bytes

Lock Element (LE) – x$le

DLM Lock (x$kjbl)

DLM Resource(x$kjbr)

Page 15: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Three Players in this Chess[RAC]-Match

• Requestor– Session [from an Instance] who is making the request for the buffer/block

• Master (kjblmaster)– Instance that has that buffer mastered – Maintains grant and convert queues– Buffer [ranges] are mastered by different instances. Provides even distribution of

mastered locks– Block re-mastering can change for various reasons (gms changes, DRM, manually

using event)

• Holder (kjblowner)– Instance that has the buffer cached

Page 16: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Buffer Blocks BasicsA word on Current and CR blocks

• Oracle includes block multi-versioning architecture– RAC extends that to multi-nodal multi-versioning

• Block can be either a current data block or a consistent read (CR) versions of a block. – Current block contains changes for all committed and uncommitted transactions. All DML gets on

a block are made in this mode – Consistent read (CR) version of a block represents a consistent snapshot of the data at a

previous point in time. Select read requests made in this mode

• Oracle applies undo segments to current blocks to produce to appropriate CR versions of a block. not RAC specific !

• Both the current and consistent read blocks are managed by the GCS.

Page 17: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cache Fusion – 1,2,and 3 ways

• 1-way block transfer No block transfer at all. Requestor, Holder, and Master is the local instance. Close to single instance performance.

• 2-way block transferNode A performs hash/directory lookup, and finds out another node is the master of this block; eg, node B. Node A send a message to B for that block. Node B found no other nodes hold that block and messages back node A

• 3-way block transferNode A requests for block, via lookup, determines Node B is master, but finds that node C is currently holding that block. NodeB sends a message to node C, and then node C send the block to node A. That’s 3-way.

Page 18: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

GCS Coordination Example 1

–Assume data block C has been read and and dirtied by InstanceA.

–Only one copy of the block exists clusterwide and represented by its SCN

•1. InstanceB attempting to modify the block submits a request to LMS.

•2. LMS transmits the request to the holder, InstanceA

•3. InstanceA receives the message, flushes the redo associated with dirtied buffer and sends the block to the InstanceB

•4. InstanceA retains the dirty buffer for recovery purposes. This dirty image of the block is called a past image (PI) of the block. A PI block cannot be modified further.

•5. On receipt of the block, the , InstanceB informs GCS that it holds the block.

–Note: The data block is not written to disk before the resource is granted , InstanceB

Page 19: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

GCS Coordination Example 2Write to Disk Coordination

–In this scenario, assume that InstanceA is holding a past image buffer, requests that Oracle writes the buffer to disk:

•1. InstanceA sends a write request to the GCS.

•2. The GCS forwards the request to the InstanceB, the holder of the current version of the block.

•3. InstanceB receives the write request and writes the block to disk.

•4. InstanceB records the completion of the write operation with the GCS.

•5. After receipt of the notification, the GCS orders all past image holders to discard their past images. These past images are no longer needed for recovery.

–Note: In this case, only one I/O is performed to write the most current version of the block to disk.

Page 20: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

GCS Coordination Example 2Write to Disk Coordination

– This scenario illustrates what happens when an instance invokes a checkpoint or cache clean buffers due to free buffer requests.

– Because multiple versions of the same data block with different changes can exist in the caches of instances in the cluster, a write protocol managed by the GCS ensures that only the most current version of the data is written to disk.

– Disk block writes are only required for cache replacement. A past image (PI) of a block is kept in memory before the block is sent if it is a dirty block. In the event of failure, Oracle reconstructs the current version of the block by reading the PI blocks.

Page 21: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Key Layers that Affect RAC Performance

• Local disk block or buffer access is ‘same-ol-same-ol’• A remote cache access driven by round-trip time• Latency variation (and CPU cost ) correlates with

– Block transfer (“wire time”)– Block Contention– Block Access Cost – block preparation– Delayed log flushes– CPU saturation/LMS scheduling– IO latency

Page 22: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Network Path

• Wire latency is very small– ~ 50% of fixed overhead is in kernel– Protocol ( e.g. UDP, RDS ) dependent

• IPC queue lengths are variable– Depends on incoming rate and service time

• Context switch and scheduling delay (CPU queue ) are variable

– Depends on process concurrency & CPU load

• Hence: time in queues can vary under load Performance of immediate message transfers depends practically on minimizing queue and context switch time

• Transfer Path Length

usersys

Context switchNIC

Driver and IP stack IPC

Global Cache (RDBMS)

Wire

CPU SocketPackets

Packet Headers

Queues

Time

Page 23: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Block transfer TimeInterconnect and IPC processing – “Wire-Time”

Message:~200 bytes

Block: e.g. 8K

LMS

Initiate send and wait

Receive

Process block

Send

Receive 8192 bytes/(1 Gb/sec)

Total access time: e.g. ~360 microseconds (UDP over 10GBE)Network propagation delay ( “wire time” ) is a minor factor for roundtrip time

( approx.: 6% , vs. 52% in OS and network stack )

Page 24: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Block Access Cost

Cost determined by •Block server process load

•Message Propagation Delay

•Operating system scheduling

• IPC CPU

• Block Access Cost = message propagation delay + IPC CPU + Operating System Scheduling + Block Server Load

Page 25: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Block Contention

• The contention-oriented waits occur when a session is attempting to read/ modify a globally cached buffer, but could not be immediately shipped

• The following are possibilities:– Buffer was pinned by a session on another node– A change to the buffer had not been flushed to disk– Too many other waiters on the grantors list, caused by frequent concurrent read

and write accesses to the same data.• Block Contention Wait Events

– gc current block busy– gc cr block busy– gc buffer busy acquire– gc buffer busy release

Page 26: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Block Access CostPrepare, Build to ship

• Two key factors of Cache Fusion Latency

• CR block request time = build time + flush time + send time• Current block request time = pin time + flush time + send time

• Always refer send times too from other instances

Page 27: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Infrastructure: Cache Fusion Latency

• Average Prepare Latency = – Blocks Served Time/Blocks Served

• Blocks Server Time =– gc cr block build time +– gc cr block flush time +– gc current block pin time +– gc current block flush time +– gc current send time +– gc cr send time +

• Blocks Served = – gc cr blocks served + gc current blocks served

Page 28: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Identifying Issues

• Factors Affecting Performance of Immediate Global Cache Access

– Machine Load

• Process concurrency for CPU

• Scheduling

• CPU utilization

– Interconnect Bandwidth

• Total bandwidth utilization for the database(s)

– LMS processes

• Real time

• CPU busy

• No application tuning required

No contention Global Cache Access – how it looks in AWR

Accurate average: 100 µsecs

• Latency in Cluster has small impact

• Average Performance is good

Page 29: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Identifying Issues

• Factors Affecting Performance with Application Contention on Data

– Log File IO latency

– LGWR responsiveness

• Schema tuning may be required

– If the application response time or throughput do not meet objectives

Global Cache Access with application contention – how it looks in AWR

Impact of Application ContentionIndex

Contention

Block on the way from another instance

Block on the way from another instance

Transfer delayed by log flush on other node(s)

Page 30: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Identifying IssuesSQL and Schema Optimization: Identifying SQL incurring highest Cluster Wait Time

Indexes with High Contention, 1 accounting for 84%

Page 31: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Identifying IssuesCause and Effect are distributed – How to read the Global Impact

ISHAN

racdb1_1A

Local sessions waiting for transfer

Block pinged out; sessions waiting for its return

Transfer delayed by log flush on other node(s)

Global cache wait events: 35% significant higher than expected

NISHA

racdb1_3B

Variance and Outliers indicateThat IO to the log file disk group affects performance In the cluster

26.1 / 73.1

Page 32: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cluster Cache Efficiency - AWR

Page 33: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cluster Cache Efficiency - AWR

Page 34: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cluster Cache Efficiency - AWR

Page 35: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

COMMON PROBLEMS AND SYMPTOMS

Page 36: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Misconfigured or Faulty Interconnect Can Cause:

• Dropped packets/fragments• Buffer overflows• Packet reassembly failures or timeouts• Ethernet Flow control kicks in• TX/RX errors

“lost blocks” at the RDBMS level, responsible for 64% of escalations

Page 37: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

“Lost Blocks”: NIC Receive Errors

ifconfig –a:eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04 inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95 TX packets:273120 errors:0 dropped:1105 overruns:0 carrier:0

Page 38: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

“Lost Blocks”: IP Packet Reassembly Failures

netstat –s

Ip:    84884742 total packets received    … 1201 fragments dropped after timeout    …    3384 packet reassembles failed

Page 39: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time Wait Class----------------------------------------------------------------------------------------------------log file sync 286,038 49,872 174 41.7 Commitgc buffer busy 177,315 29,021 164 24.3 Clustergc cr block busy 110,348 5,703 52 4.8 Clustergc cr block lost 4,272 4,953 1159 4.1 Clustercr request retry 6,316 4,668 739 3.9 Other

Finding a Problem with the Interconnect or IPC

Should never be here

Page 40: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

CPU Saturation or Memory Depletion

Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time Wait Class----------------- --------- ------- ---- ----- ----------db file sequential 1,312,840 21,590 16 21.8 User I/Oread gc current block 275,004 21,054 77 21.3 Clustercongestedgc cr grant congested 177,044 13,495 76 13.6 Clustergc current block 1,192,113 9,931 8 10.0 Cluster2-waygc cr block congested 85,975 8,917 104 9.0 Cluster

“Congested”: LMS could not dequeue messages fast enoughCause : Long run queueus and paging on the cluster nodes

Page 41: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Impact of IO capacity issues or bad SQL execution on RAC

• Log flush IO delays can cause “busy” buffers• “Bad” queries on one node can saturate the link • IO is issued from ALL nodes to shared storage ( beware of one-node

“myopia” )

Cluster-wide impact of IO or query plan issues responsible for 23% of escalations

Page 42: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

SummaryLook for: • High impact of “lost blocks” , e.g.gc cr block lost

• IO capacity saturation , e.g. gc cr block busy

• Overload and memory depletion, e.ggc current block congested

All events with these tags are potential issue, if their % of db time is significant.Compare with the lowest measured latency ( target , c.f. SESSION HISTORY reports or SESSION HISTOGRAM view )

Page 43: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

INFRASTRUCTURE BEST PRACTICES AND CONFIGURATION

Page 44: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Infrastructure: Interconnect Bandwidth

• Interconnect must be private (can be VLAN)– Should not roll up to distribution switch (keep in layer 2 domain)

• Network Cards– Use Fast interconnect 10GbE (with jumbo frames) or Infiniband– Multiple NICs generally not required for performance and scalability

• Bandwidth requirements depend on – CPU power per cluster node– Application-driven data access frequency – Number of nodes and size of the working set– Data distribution between PQ slave

Page 45: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Infrastructure: IPC configuration

• Settings:– Socket receive buffers ( 256 KB – 4MB )– Negotiated top bit rate and full duplex mode – NIC ring buffers– Ethernet flow control settings– CPU(s) receiving network interrupts

• Verify your setup:– CVU does checking – Load testing eliminates potential for problems

Page 46: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Infrastructure: IO capacity

• Disk storage is shared by all nodes, i.e the aggregate IO rate is important

• Log file IO latency can be important for block transfers– Log file sync that exceed 500 ms are logged

• Parallel Execution across cluster nodes requires a well-scalable IO subsystem

– Disk configuration needs to be responsive and scalable

– Test with Calibrate I/O or Orion or SLOB2

Page 47: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

<Insert Picture Here>

APPLICATION AND DATABASE DESIGN

Page 48: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Application Considerations

• Scheduling delays on high context switch rates on busy systems may increase the variation in the cluster traffic times

• Latch and mutex contention can cause priority inversion issues for critical background procs.

• More processes imply higher memory utilization and higher risk of paging

• Control the number of concurrent processes

– Use connection pooling

– Avoid connection storms (pool and process limits )

• Ensure that load is well-balanced over nodes

How to avoid Resource Contention in applications

vixencomet

racdb1_3 racdb1_4

Connection Pool

Oracle GI Oracle RAC

Oracle GI Oracle RAC

Page 49: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Services

• Application Workloads can be defined as Services– Workload Management– Do not ever ever use instance names

• Individually managed and controlled• On instance failure, automatic re-assignment• Service Performance is individually tracked• Finer Grained Control with Resource Manager• Integrated with Other Tools – i.e. Scheduler, Streams• Managed by Oracle Clusterware• Several services created and managed by the database

Page 50: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Scalability Pitfalls

• Serializing contention on a small set of data/index blocks

–monotonically increasing key –frequent updates of small cached tables–segment without ASSM or Free List Group (FLG)

• Full table scans• Frequent hard parsing• Concurrent DDL ( e.g. truncate/drop )

Page 51: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Index Block Contention: Optimal Design

• Monotonically increasing sequence numbers• Large sequence number caches

select sequence_owner, sequence_name, increment_by, cache_size, order_flag, last_numberfrom dba_sequenceswhere (sequence_owner not in (&OLIST1) order by sequence_owner, cache_size, last_number;

• Hash or range partitioning– Local indexes

Page 52: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Data Block ContentionOptimal Design

• Small tables with high row density and frequent updates and reads can become “globally hot” with serialization e.g.– Queue tables– session/job status tables– last trade lookup tables

• Higher PCTFREE for table reduces # of rows per block

Page 53: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Summary

Look for:• Indexes with right-growing characteristics

– Eliminate indexes which are not needed

• Frequent updated and reads of “small” tables– “small”=fits into a single buffer cache

• SQL which scans large amount of data– Bad execution plan– More efficient when parallelized

Page 54: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

<Insert Picture Here>

SUMMARY: PRACTICAL PERFORMANCE ANALYSIS

Page 55: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Global Cache Event Semantics

All Global Cache Events will follow the following format:GC …• CR, current

– Buffer requests and received for read or write• block, grant

– Received block or grant to read from disk• 2-way, 3-way

– Immediate response to remote request after N-hops• busy

– Block or grant was held up because of contention• congested

– Block or grant was delayed because LMS was busy or could not get the CPU

Page 56: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

What to Look For

Look for:• Gc [current/cr] [2/3]-way -Monitor if average ms > 1ms or close to Disk I/O latency. Look at

reducing latency

• Gc [current/cr] grant 2-way – Permission to read from disk. Monitor disk I/O

• Gc [current/cr][block/grant] congested – Long Block access cost. Review CPU/memory utilization

• Gc [current/cr] block busy – Review block contention

• Gc [current/cr][failure/retry] – Review private interconnect, network errors or hardware problems

Page 57: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cluster Cache Coherency - EM

Page 58: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

Cluster Database - EM

Page 59: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

General RAC Principles

• Performance Monitoring & Diagnosis tools for RAC• AWR captures data from all active instances of RAC• ADDM presents data in cluster wide perspective• ASH reports statistics for all active sessions of all active instances• Enterprise manager is RAC aware • No fundamentally different design and coding practices for RAC• Badly tuned SQL and schema will not run better • Serializing contention makes applications less scalable• Standard SQL and schema tuning solves > 80% of performance problems

Page 60: DOUG – Technology Day Tuning Seminar by Nitin Vengurlekar.

More detailed information is available at

viscosityna.com or by talking to a real

person at 469.444.1380

65