1 Creation Status Collection fork ( ) exec ( ) exit ( ) wait ( ) Execution Termination Process Lifetime arent Child
Dec 19, 2015
1
Creation
StatusCollection
fork ( )
exec ( )
exit ( )
wait ( )
Execution
Termination
Process Lifetime
Parent Child
2
Process Performance Issues
The maximum number of processes allowed is 30,000.
Executables that use shared libraries take fewer system resources (disk space, memory, I/O and so on.
Threads are more efficient that multiple processes.
Zombie processes cause no performance problems.
3
Multithreading
A thread is a logical sequence of program instructions.
The kernel is multithreaded.
Multiple tasks may be running in the kernel simultaneously and independently.
A user process can have many application threads that execute independently of each other.
Fewer system resources are used that mulltiple processes.
Special programming techniques are required.
4
proc1 proc2 proc3 proc4 proc5
User Layer
Kernel Layer
Hardware Layer
= Applicationthread
= LWP = CPU = Kernel thread
CPU assignment
Process Thread Examples
5
Performance Issues
Multithreading an application allows it to:
Be broken into separate tasks that can be scheduled and executed independently.
Take advantage of multiprocessors with less overhead than multiple processes.
Share memory without going through the overhead and compexity of IPC mechanisms.
Use a cleaner programming model for certain types of applications.
Extend the program more easily.
6
Locking
Locks are used to synchronize threads by serialization.
They protect critical data from simultaneous write access.
Locks must be used when threads share writable data.
SunOS provides four types of locks.
Which type is used depends on the requirements.
A bad locking design can cause performance problems.
Locking problems usually require a significant reprogramming.
7
Locking Problems
Lock contention
Granularity
Inappropiate lock type
Deadlock
"Lost" locks
Race conditions
Incomplete implementation
8
The lockstat Command
Lock use in the kernel is identified.
Unidentified delays may be caused by lock contention.
Excessive counts may indicate a problem.
# lockstat lpstat
Adaptive mutex block: 2 events
Count indv cum1 rent nsec Lock Caller -----------------------------------------------------------------------------------------------------------------------------------------------------------------
1 50% 50% 1.00 87500 0xf5ae43d0 esp_poll_loop+0xcc 1 50% 100% 1.00 151000 0xf5ae3ab6 esp_poll_loop+0x8c
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
9
The clock Routine
The clock executes at interrupt level 10.
Most system timing is run off this clock.
Each time the clock routine executes is a tick.
For most processors, there are 100 ticks per second.
Ticks per second can be set to 1000 for real-time processing.
This is the limit of normal timing resolution.
The time-of-day clock will run slow.
10
Process Monitoring Using ps
• You need tp identify active processes before determining;
• Which process is causing a delay
• Which resource is bottlenecking the process
• The ps command enables you to check the status of processes.
• The ps command helps determine how to set process priorities.
• The BSD version, /usr/ucb/ps -aux, provides the best performance related data.
11
Sleeping
Running
Ready
Blocked Scheduled
Awakened
Scheduling States
12
Scheduling Classes
Unix provides four scheduling classes by default. These are:
TS - The timesharing class, for normal user work. Proiorities are adjusted based on CPU usage.
IA - The interactive class, derived from the timesharing class, provides better performance for the task in the active window in OpenWindows or CDE.
SYS - The system class, also called he kernel priorities, is used for system threads such as the page daemon and clock thread.
RT - The real-time class has the highest priority in the system , except for interrupt handling; it is even higher than the system class.
13
Dispatch Parameter Table Issues
For time-sharing class processes:
• Reducing time quanta favors interactive processes.
• Raising time quanta favorscompute-bound and large processes.
• Using the ts_maxwait and ts_lwait fields controls CPU starvation.
• Slightly raising the values of ts_tqexp causes the priority of compute -bound processes to drop more slowly.
• Changing the table can be done to fit your workload.
14
The dispadmin Command
• Displays or changes scheduler parameter
• Uses options:
• -l - List available schedulingclasses
• -c class - Specify the class whose parameters are to be displayed or changed
• -g - Displays configure parameters
• Provides a simple way of formatting control file
• -s file - Sets parameters from a file
15
The interactive Scheduling Class
• Is used to enhance interactive performance
• Is the default scheduling class for processes in Common Desktop Environment and Open Windows sessions
• Uses most of the time-sharing class facilities
• Boosts the priority of the task in the active window by 10 points
• Priority is reset when it is no longer the active window.
• Does not boost processes changed using nice or other commands
16
Processor Sets
Allows exclusive use of groups of processors by certain processes
Also known as CPU fencing
Is very different from the pbind(1M)
Is managed by the psrset (1M) command
Is controlled by the root user
Has system-defined processor sets which can be used by and user
Forces DR to release bindings if necessary
17
The Run Queue
A count of kernel threads waiting to run is kept.
It contains a total of all system dispatch queues.
It is not scaled by the number of CPUs.
A depth of 3-5 per CPU is usually OK
This depends on the type of work being run.
There is no way to tell what is waiting or how long it has been waiting
18
CPU Activity
User - The user process is running
System - Kernel is running.
This includes system thread time and user system calls.
Wait I/O - The CPUs are idle, but a disk device is active.
Idle - The system is waiting
Some reports add Wait I/O and Idle.
This is usually reported as Idle time.
You can check the tool's man page.
19
CPU Control and Monitoring
Processor control and information is reported by:
mpstat - Displays CPU usage statistics
psradm - Enables and disables individual CPU
psrinfo - Determines which CPUs are enabled
prtconf - Shows device configuration
prtdiag - Prints system configuration and diagnostic information
Process Manager - Shows current activity
psrset - Manages processor sets
20
What is Cache?
• Keeps accessed data near its user
• Must provide high-speed data access
• Holds a small subset of the available data
• Is used by hardware and software
• Has many different types around the system
• Is critical for performance
• Can be managed in many different ways
21
SRAM and DRAM
Hardware caches are usually SRAM.
Static RAM does not need a refresh.
DRAM data fades after about 10 cycles.
Refresh rewirtes data, but this delays access.
SRAM does not fade, so no refresh is needed.
SRAM takes four times as many transistors as DRAM.
SRAM will always be more costly and take more power.
22
CPU Caches
There are usually two levels of cache in a CPU.
Level one (internal) - Usually 4-40 Kbytes in size
Level two (external) - Usually .5-8 Mbytes in size
A level one cache operates at CPU speeds.
Data must be in the level one cache before the CPU can use it.
23
Cache Hit Rate
The performance of a cache depends on the hit rate.
The cache hit rate is how often requested data is available
The cache hit rate depends on:
Size of the cache
Fetch rate
Locality of data references
Cache structure
24
Cache Hit Rate
Misses are very expensive
For example:
Hit cost is 20 units, miss cost is 600 units.
At 100 percent hit rate, cost = 100 x 20 = 2000.
At 99 percent ht cost = (99 x 20) + (1 x 600) = 1980 + 600 = 2580.
A one percent miss cost is 580 + 2000 or 29 percent degradation
The exact numbers depend on the system.
25
CPUL1 Cache L2 CacheRegister
PeopleNetworkDiskMain
Memory
System Cache Hierarchies
26
The Memory Free Queue
• Main memory is a fully associative disk caceh managed by the O.S
• Move ins and move outs occur as in any cache.
• For memory, this is called paging.
• Moves involving disk are very expensive.
• A move out followed by a move in is too costly.• Free memory pages are kept available to avoid the need for move outs before a move in.
• You always need to have "enough" pages on the free queue.
27
The paging Mechanism
• Every quarter second a check is made to see if the amount of free memory is less than the quantity specified be the lotsfree parameter.
• If it is, the page daemon is run to replenish the free memory queue.
• Pages are stolen from their users if they have not been used recently.
• Pages are scanned to determine which can be stolen.
• The more pages needed, the faster the scan rate.
• If the page daemon cannot keep up, swapping may occur.
28
Priority Paging
Is included with the Solaris 7 OS but is disabled by default
A kernel patch is required for Solaris 2.6 and 2.5.1 OS
Is activated be setting priority_paging to 1 in /etc/system
Has a new tuning parameter, cachefree, whose default value is twice lotsfree
Starts the page daemon running at cachefree
When enabled, steals only file system pages (data files) until lotsfree is reached
Is very good for large users of random I/O
29
Swapping
Swapping is a last resort (desperation swapping). If the page daemon consistently cannot keep up with the demand formemory, memory use must be cut.
The number of swap is not important; the fact that there are swaps is.
A process will be swapped when its last LWP is swapped out
Its memory will not be freed until then.
Do not try to tune swaps, try to eliminate them.
30
"tmpfs"
It allows the creation of a virtual memory ramdisk such as /tmp
It uses virtual memory like any other user.
It uses real memory and swap space.
If used heavily, /tmp can fill main memory
You can limit the use of /tmp.
The size option in the vfstab entry is used.
You must not move it to a hard drive partision