CHAPTER 22 Database Troubleshooting
Feb 23, 2016
CHAPTER 22Database Troubleshooting
Introduction to Database Troubleshooting
• DBAs need to be experts at database troubleshooting.• There are a wide variety of topics covered in this chapter:
• Assessing database availability issues quickly.• Identifying system performance issues with operating system
utilities.• Querying data dictionary views to display resource intensive SQL
statements.• Using Oracle performance tools to identify resource consuming
SQL statements.• Identifying and resolving locking issues.• Troubleshooting open cursor issues.• Investigating issues with the undo and temporary tablespaces.• Auditing database activities.
Checking Server and Database Availability
• The following command establishes:• Is the database up and available• Is the listener up• Is the network up
$ sqlplus barts/l1sa@'dwdb1:1521/dwrep1'
• Use ping to see if the box can be connected to over the network• Use telnet to establish server availability on a given port• Use tnsping to troubleshoot Oracle Net connectivity
Investigating Disk Fullness• When first logging onto a box, one issue that will cause a
database to hang or have problems is a full mount point.• The df command with the human readable -h switch
assists with verifying disk fullness:• $ df -h
• The du command is also useful for investigating where space is being consumed:
• $ du -sh
Locating the Alert Log and Trace Files
• Troubleshooting an unavailable database oftentimes begin with inspecting the alert.log.
• Here’s a OS function that can help you navigate to the alert.log location:
function bdump { if [ "$ORACLE_SID" = "O12C" ]; then cd /orahome/app/oracle/diag/rdbms/o12c/O12C/trace elif [ "$ORACLE_SID" = "O11R2" ]; then cd /orahome/app/oracle/diag/rdbms/o11r2/O11R2/trace fi}
Removing Files• Linux/Unix environments provide a wide variety of ways to
identify the size and age of files.• When troubleshooting, oftentimes it’s useful to find files of
over a certain size or age. $ find . -type f -mtime +2 -name "*.trc"
• Then to delete files identified by the find command (be very careful when doing this):
$ find . -type f -mtime +2 -name "*.trc" | xargs rm
Viewing the Alert Log via OS Tools• After navigating to the directory that contains the alert.log, you can view the
most current messages by viewing the end (furthest down) in the file (in other words, the most current messages are written to the end of the file).
• To view the last 50 lines, use the tail command: $ tail -50 alert_<SID>.log • You can continuously view the most current entries by using the -f switch:• • $ tail -f alert_<SID>.log • You can also directly open the alert.log with an operating system editor
(such as vi):• $ vi alert_<SID>.log
Viewing the alert.log Using the ADRCI Utility• If you're using Oracle Database 11g or higher, you can use the ADRCI utility to
view the contents of the alert.log file.• Run the following command from the operating system to start the ADRCI utility: $ adrci • You should be presented with a prompt: adrci> • Use the SHOW ALERT command to view the alert.log file: adrci> show alert• • If there are multiple Oracle homes on the server, then you will be prompted to
choose which alert.log you want to view.
Identifying System Bottlenecks Using vmstat
• Linux/Unix environments provide a wide variety of command line tools for troubleshooting system bottlenecks.
• The vmstat utility displays real-time performance information about processes, memory, paging, disk I/O, and CPU usage.
$ vmstat 2 10
Identifying System Bottlenecks Using top
• Another tool for identifying resource-intensive processes is the top command. Use this utility to quickly identify which processes are the highest consumers of resources on the server.
• By default, top will repetitively refresh (every few seconds) information regarding the most CPU-intensive processes.
• $ top
Mapping an Operating System Process to a SQL Statement• Oftentimes when you have multiple databases running on
a single server, it’s difficult to pinpoint which database and process may be consuming inordinate amounts of system resources.
• In these situations, use ps to identify the process and its associated database.
• Then use a SQL statement to identify what the process is doing within the database.
Mapping an Operating System Process to a SQL Statement$ ps -e -o pcpu,pid,user,tty,args | sort -n -k 1 -r | head
• Then once the process and associated database are identified, then run a query similar to this one to identify details about the process:
select 'USERNAME : ' || s.username|| chr(10) || 'OSUSER : ' || s.osuser || chr(10) || 'PROGRAM : ' || s.program || chr(10) || 'SPID : ' || p.spid || chr(10) || 'SID : ' || s.sid || chr(10) || 'SERIAL# : ' || s.serial# || chr(10) || 'MACHINE : ' || s.machine || chr(10) || 'TERMINAL : ' || s.terminalfrom v$session s, v$process pwhere s.paddr = p.addrand p.spid = '&PID_FROM_OS';
Monitoring Real-Time SQL Execution Statistics• The V$SQL_MONITOR view provides near real-time statistics on resource intenisve
SQL queries:
select * from (select a.sid session_id,a.sql_id,a.status,a.cpu_time/1000000 cpu_sec,a.buffer_gets,a.disk_reads,b.sql_text sql_textfrom v$sql_monitor a ,v$sql bwhere a.sql_id = b.sql_idorder by a.cpu_time desc)where rownum <=20;
Monitoring Real-Time SQL Execution Statistics• The V$SQL_MONITOR view provides near real-time statistics on resource intenisve
SQL queries:
select * from (select a.sid session_id,a.sql_id,a.status,a.cpu_time/1000000 cpu_sec,a.buffer_gets,a.disk_reads,b.sql_text sql_textfrom v$sql_monitor a ,v$sql bwhere a.sql_id = b.sql_idorder by a.cpu_time desc)where rownum <=20;
Displaying Resource Intensive SQL• To display queries currently executing:
select * from (select a.sid session_id,a.sql_id,a.status,a.cpu_time/1000000 cpu_sec,a.buffer_gets,a.disk_reads,substr(b.sql_text,1,15) sql_textfrom v$sql_monitor a ,v$sql bwhere a.sql_id = b.sql_idand a.status='EXECUTING'order by a.disk_reads desc)where rownum <=20;
Oracle provides several utilities for diagnosing database performance issues• Automatic Workload Repository (AWR)• Automatic Database Diagnostic Monitor (ADDM)• Active Session History (ASH)• Statspack
Using AWR• An AWR report is good for viewing the entire system
performance and identifying the top resource-consuming SQL queries.
• Run the following script to generate an AWR report:
SQL> @?/rdbms/admin/awrrpt
• You can also generate an AWR report for a specific SQL statement by running the awrsqrpt.sql report.
• You will be prompted for the SQL_ID of the query of interest: SQL> @?/rdbms/admin/awrsqrpt.sql
Using ADDM• ADDM useful as a first place to look for database
performance issues.• Use DBMS_ADDM PL/SQL package or Oracle provided
SQL script to view the output of ADDM.• You need to provide as input (you’ll be prompted) a begin
and end snapshot ID:
SQL> @?/rdbms/admin/addmrpt
Using ASH• Useful for identifying short lived performance problems.• Use the Oracle provided script to manually run this utility:
SQL> @?/rdbms/admin/ashrpt
Using Statspack• Statspack does not require an extra license.• Provides much of the same type of information as the AWR reports.• First create the repository and the PERFSTAT user:
SQL> @?/rdbms/admin/spcreate.sql • To enable the automatic gathering of Statspack statistics, run this script: • SQL> @?/rdbms/admin/spauto.sql • After some snapshots have been gathered, you can run the following
script as the PERFSTAT user to create a Statspack report: • SQL> @?/rdbms/admin/spreport.sql
Detecting and Resolving Locking Issues
• Oracle provides several data dictionary views to help diagnose locking issues:
V$LOCKV$LOCKED_OBJECTV$SQLAREAV$SESSION
Resolving Open Cursor Issues• Run a query such as the following to determine the number of open cursors each
session has opened:• select a.value, c.username, c.machine, c.sid, c.serial#from v$sesstat a ,v$statname b ,v$session cwhere a.statistic# = b.statistic#and c.sid = a.sidand b.name = 'opened cursors current'and a.value != 0and c.username IS NOT NULLorder by 1,2;
• You may need to increase the value of OPEN_CURSORS to resolve issues. • Another common issue is code that doesn’t correctly close a cursor.
Determining if Undo is Correctly Sized• Undersized online redo log files can cause excessive switching which in
turn causes modified database blocks to be more frequently flushed from memory.
• To get an idea of switching frequency:
select to_char(begin_time,'MM-DD-YYYY HH24:MI') begin_time,ssolderrcnt ORA_01555_cnt,nospaceerrcnt no_space_cnt,txncount max_num_txns,maxquerylen max_query_len,expiredblks blck_in_expiredfrom v$undostatwhere begin_time > sysdate - 1order by begin_time;
Viewing SQL that is Consuming Undo Space• Sometimes you may face a situation where you keep running out of
space in the UNDO tablespace. • In these situations it's helpful to identify which users are consuming
space in the undo tablespace.• Run this query to report on basic information regarding space
allocated on a per user basis:• select s.sid, s.serial# ,s.osuser, s.logon_time ,s.status, s.machine ,t.used_ublk ,t.used_ublk*16384/1024/1024 undo_usage_mb from v$session s ,v$transaction t where t.addr = s.taddr;
Determining if Temporary Tablespace is Sized Correctly• There is no exact formula to determine the optimal size of the
temporary tablespace.• You’ll have to monitor your temporary tablespace while there is a load
on your database to determine its usage patterns.• If you are using Oracle Database 11g or higher, run the following
query to show both the allocated and free space within the temporary tablespace:
• select tablespace_name,tablespace_size/1024/1024 mb_size,allocated_space/1024/1024 mb_alloc,free_space/1024/1024 mb_freefrom dba_temp_free_space;
Viewing SQL that is Consuming Temporary Space
• When troubleshooting temporary tablespace issues, run a query such as this to show how resources are being consumed:
SELECT s.sid, s.serial# ,s.username, p.spid ,s.module, p.program ,SUM(su.blocks) * tbsp.block_size/1024/1024 mb_used ,su.tablespace FROM v$sort_usage su ,v$session s ,dba_tablespaces tbsp ,v$process p WHERE su.session_addr = s.saddr AND su.tablespace = tbsp.tablespace_name AND s.paddr = p.addr GROUP BY s.sid, s.serial#, s.username, s.osuser, p.spid, s.module, p.program, tbsp.block_size, su.tablespace ORDER BY s.sid;
Summary• Database troubleshooting covers a wide variety of topics
and subject areas.• This chapter focused on some common database
problems you’ll encounter.• As a DBA you should be aware of what kinds of problems
can happen, how to diagnose issues, and how to resolve them in a timely manner.