Lecture – Performance Performance management on UNIX
Jan 14, 2016
Lecture –Performance
Performance management on UNIX
21/04/23 2
Performance Analysis
Performance analysis involves identifying various system bottlenecks
This involves a number of steps We must ask a number of questions
Is there a performance Problem? Is the problem CPU or I/O related?
21/04/23 3
Performance Analysis CPU Related?
What is the current load on the CPU? What is the average load on the CPU?
I/O Related Is it normal disk I/O?
Would more/faster disks help? Is it paging I/O?
Would more physical memory help?
21/04/23 4
Related to a Particular User or Program?
Identify the user / program Identify what they are doing to cause the problem Revise their operating procedures Consider removing them from the system
21/04/23 5
Determining CPU Usage Determining the CPU usage is the first thing we should
do There are a number of tools to do this
vmstat gives several pieces of useful information including CPU usage vmstat [interval] [count] Interval is the number of seconds between reports and count is
the number of reports to generate
21/04/23 6
vmstat 2 10[rbradley@aisling]$ vmstat 2 10 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98 0 0 0 5484 27216 136584 198844 0 0 0 0 130 51 0 2 98 0 0 0 5484 27216 136588 198848 0 0 0 86 157 63 0 0 100 0 0 0 5484 27216 136588 198848 0 0 0 0 139 46 0 0 100 0 0 0 5484 27224 136588 198836 0 0 0 30 153 47 0 0 100 0 0 0 5484 27712 136588 198824 0 0 0 8 166 107 1 0 99 0 0 0 5484 26876 136588 198828 0 0 0 0 139 92 6 2 91 0 0 0 5484 26876 136592 198824 0 0 0 144 137 69 0 0 100
7
vmstat
The first line gives the average values since the system was booted and should be ignored
To determine the CPU usage, we are interested in the last three columns, us, sy, id
us: % of CPU dedicated to User tasks sy: % of CPU dedicated to System tasks. Including I/O performing general O/S
functions etc. id: % of CPU idle
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 8
Analysing vmstat output (CPU) Just because CPU time is high or idle time is low does not
indicate a system problem It may simply indicate that a number of batch jobs are
scheduled to run at the same time and might benefit from being rearranged
In order to establish if there is a genuine problem it is necessary to monitor the system over an extended period
If average CPU% remain high, there is a problem
21/04/23 9
Analysing vmstat output(Process States)
There are three states in which a process may be at any point in time
Runtime, uninterrupted sleep, swapped out Process Statistics:
r: Number of processes waiting for runtime b: Number of processes in uninterrupted sleep w: Number of processes swapped out, but otherwise able to run
A high r suggests there is a bottle neck.
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 10
Analysing vmstat output (Memory)
Memory Statistics swapd: Amount of virtual memory used (KB) free: Amount of idle memory (KB) buff: Ammount of memory used in buffers cache:amount of memory left in cache
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 11
Analysing vmstat output (Swap)
Swap Statistics si: Amount of memory swapped in from disk (KB/s) so: Amount of memory swapped out to disk (KB/s)
Swap statistics are arguably the most important statistic to monitor, and of these, the so field
This field indicates the pages that have been swapped out, even if done before vmstat was started
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 12
Analysing vmstat output (I/O)
I/O Statistics bi: Blocks received from a block device (blocks/sec) bo: Blocks sent to a block device (blocks/sec)
If there are a large number of block transfers, the problem with your system may lie here (i.e. device access is high)
A single reading, however is not indicative of the system as a whole, simply a snapshot
All Linux blocks are 1KB except for CDRom blocks (2KB)
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 13
Analysing vmstat output (System)
System Statistics in: The number of interrupts per second, including the
system clock cs: The number of context switches per second
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 14
Analysing vmstat output (CPU usage)
System Statistics us: % of CPU dedicated to user tasks sy: % of CPU dedicated to system tasks id: % of CPU idle
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5484 27240 136584 198840 0 1 5 8 8 8 4 7 4 0 0 0 5484 27240 136584 198840 0 0 0 96 155 100 0 0 100 0 0 0 5484 27232 136584 198844 0 0 0 0 159 112 2 0 98
21/04/23 15
top top is another tool for identifying problems with a LINUX system Displays the top CPU processes Displays a listing of the most CPU intensive tasks on the system Can provide an interactive interface for manipulating the processes Default is to update every 5 seconds
top operates by examining files in the /proc pseudo file system This pseudo file system is used as an interface to kernel data
structures man proc
21/04/23 16
[rbradley@aisling rbradley]$ top 17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, 0.0761 processes: 59 sleeping, 2 running, 0 zombie, 0 stoppedCPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idleMem: 513316k av, 200052k used, 313264k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_cSwap: 1052248k av, 9096k used, 1043152k free 34656k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root 15 0 108 76 56 S 0.0 0.0 0:15 0 init 2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd 3 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 kapmd 4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd_CPU0 9 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush 226 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald 586 root 15 0 200 160 116 S 0.0 0.0 0:08 0 syslogd 590 root 15 0 180 168 120 S 0.0 0.0 0:03 0 klogd 666 root 15 0 480 348 232 S 0.0 0.0 1:09 0 sshd 719 root 15 0 52 4 0 S 0.0 0.0 0:00 0 gpm 728 root 15 0 176 148 88 S 0.0 0.0 0:05 0 crond 785 xfs 15 0 1836 60 32 S 0.0 0.0 0:00 0 xfs 803 daemon 15 0 180 164 116 S 0.0 0.0 0:00 0 atd 812 root 23 0 52 4 0 S 0.0 0.0 0:00 0 mingetty 813 root 23 0 52 4 0 S 0.0 0.0 0:00 0 mingetty
top
17
Analysing top output
Up: The time the system has been up and the three load averages Average number of processes ready to run in the last 1,5 and 15
minutes Same as the output of uptime
Processes: The total number of processes running at the time of the last update Broken down into running, sleeping, stopped and zombied (A zombie process is a finished process where the parent has not read it
exit state – which causes the process to be cleaned up)
17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, 0.0761 processes: 59 sleeping, 2 running, 0 zombie, 0 stoppedCPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idleMem: 513316k av, 200052k used, 313264k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_cSwap: 1052248k av, 9096k used, 1043152k free 34656k cached
21/04/23 18
Analysing top output CPU States: The percentage of CPU time in user mode, system
mode, niced tasks (negative nice tasks) and idle Time spent in niced tasks will also be counted system and user time, so
the total will be more than 100%
Mem: Statistics on memory usage, including total available memory, free memory, used memory, shared memory, memory used for buffers
17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, 0.0761 processes: 59 sleeping, 2 running, 0 zombie, 0 stoppedCPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idleMem: 513316k av, 200052k used, 313264k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_cSwap: 1052248k av, 9096k used, 1043152k free 34656k cached
21/04/23 19
Analysing top output Swap: Statistics on swap space including total swap space and
used swap space This and the Mem section together are the same as the output of free*
PID: The process ID of each task USER: The username pf the task’s owner PRI: The priority of the task NI: The nice value of the task. Negative values are lower priority
17:14:41 up 47 days, 2:27, 8 users, load average: 0.06, 0.03, 0.0761 processes: 59 sleeping, 2 running, 0 zombie, 0 stoppedCPU states: 0.0% user 0.2% system 0.0% nice 0.0% iowait 99.8% idleMem: 513316k av, 200052k used, 313264k free, 0k shrd, 44976k buff 57692k actv, 11208k in_d, 1024k in_cSwap: 1052248k av, 9096k used, 1043152k free 34656k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root 15 0 108 76 56 S 0.0 0.0 0:15 0 init
21/04/23 20
Analysing top output
SIZE: The size of the task’s code plus data stack space, in kilobytes RSS: The total amount of physical memory used by the task in
kilobytes SHARE: The amount of shared memory used by the task STATE: The state of the task, S: sleeping, D: uninterrupted sleep,
R: running, Z: zombies, T: stopped or traced %CPU: The task’s share of the CPU since the last screen update as
a a percentage of total CPU time %MEM: The task’s percentage of physical memory Time: Total CPU time used by process since it started COMMAND: The task’s command name
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1 root 15 0 108 76 56 S 0.0 0.0 0:15 0 init
21/04/23 21
Using top to control processes
In addition to command-line options for controlling the appearance of top (not covered here) there are a number of commands that can be issued to top while running Space: immediately updates the display ^L: Erases and redraws the screen k: kill a process You will be prompted for the pid and a signal to
send to the process (normally 15)
21/04/23 22
Using top to control processes
i: ignore zombie processes n: change the number of processes to view r: renice a process P: sort tasks by CPU usage M: sort tasks by Memory usage
21/04/23 23
Renice
The renice command is used to alter the priority of running processes
The default nice value is 0 The range in Linux is -20 to +20 The lower the value the faster the process runs Can examine the nice value of a process using ps –l
21/04/23 24
Renice
The owner of and root can change the nice value of aprocess using renice
Changes apply to all child processes renice priority [[-p] pid ...] [[-g] pgrp ...] [[-u] user ...]
[rbradley@aisling]$ ps -lF S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD0 S 1634 24496 24495 0 75 0 - 1091 wait4 pts/1 00:00:00 bash0 R 1634 26361 24496 0 75 0 - 778 - pts/1 00:00:00 ps
[rbradley@aisling]$ renice 5 2449624496: old priority 0, new priority 5
[rbradley@aisling]$ ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD0 S 1634 24496 24495 0 80 5 - 1091 wait4 pts/1 00:00:00 bash0 R 1634 26363 24496 0 80 5 - 777 - pts/1 00:00:00 ps
21/04/23 25
Renice Once a nice value has been increased, only the root user can
reduce it again, not even to the default value
[rbradley@aisling]$ renice 19 2449624496: old priority 5, new priority 19
[rbradley@aisling]$ ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 1634 24496 24495 0 94 19 - 1091 wait4 pts/1 00:00:00 bash
0 R 1634 26390 24496 0 94 19 - 778 - pts/1 00:00:00 ps
[rbradley@aisling]$ renice 1 24496
renice: 24496: setpriority: Permission denied
21/04/23 26
How Much Swap Space? A quick rule of thumb often used is twice as much as you have
physical memory
This approach is a bit simplistic and does not scale well
1. Estimate total memory requirements
2. Add some megabytes as a spare
3. Subtract the amount of physical memory available
4. If the value from 3 is > 3 times the available physical memory, you need more memory
21/04/23 27
How Much Swap Space
Sometimes the above formula will show that you don’t need swap space at all
It is a good policy to create some anyway Linux uses the swap space so that as much physical
memory as possible is kept free It swaps out pages that have not been used for a while When memory is needed, it is available
21/04/23 28
How Much Swap Space?
If swap space is removed (using the swapoff command) the system will attempt to move any swapped pages into other swap space or physical memory
If there is not enough space elsewhere the system may become unavailable for a time, while it sorts itself out, but it will come back