1 Process Lifetime Parent Child. 2 Process Performance Issues " The maximum number of processes allowed is 30,000. " Executables that use shared libraries.

1

Creation

StatusCollection

fork ( )

exec ( )

exit ( )

wait ( )

Execution

Termination

Process Lifetime

Parent Child

2

Process Performance Issues

The maximum number of processes allowed is 30,000.

Executables that use shared libraries take fewer system resources (disk space, memory, I/O and so on.

Threads are more efficient that multiple processes.

Zombie processes cause no performance problems.

3

Multithreading

A thread is a logical sequence of program instructions.

The kernel is multithreaded.

Multiple tasks may be running in the kernel simultaneously and independently.

A user process can have many application threads that execute independently of each other.

Fewer system resources are used that mulltiple processes.

Special programming techniques are required.

4

proc1 proc2 proc3 proc4 proc5

User Layer

Kernel Layer

Hardware Layer

= Applicationthread

= LWP = CPU = Kernel thread

CPU assignment

Process Thread Examples

5

Performance Issues

Multithreading an application allows it to:

Be broken into separate tasks that can be scheduled and executed independently.

Take advantage of multiprocessors with less overhead than multiple processes.

Share memory without going through the overhead and compexity of IPC mechanisms.

Use a cleaner programming model for certain types of applications.

Extend the program more easily.

6

Locking

Locks are used to synchronize threads by serialization.

They protect critical data from simultaneous write access.

Locks must be used when threads share writable data.

SunOS provides four types of locks.

Which type is used depends on the requirements.

A bad locking design can cause performance problems.

Locking problems usually require a significant reprogramming.

7

Locking Problems

Lock contention

Granularity

Inappropiate lock type

Deadlock

"Lost" locks

Race conditions

Incomplete implementation

8

The lockstat Command

Lock use in the kernel is identified.

Unidentified delays may be caused by lock contention.

Excessive counts may indicate a problem.

# lockstat lpstat

Adaptive mutex block: 2 events

Count indv cum1 rent nsec Lock Caller -----------------------------------------------------------------------------------------------------------------------------------------------------------------

1 50% 50% 1.00 87500 0xf5ae43d0 esp_poll_loop+0xcc 1 50% 100% 1.00 151000 0xf5ae3ab6 esp_poll_loop+0x8c

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

9

The clock Routine

The clock executes at interrupt level 10.

Most system timing is run off this clock.

Each time the clock routine executes is a tick.

For most processors, there are 100 ticks per second.

Ticks per second can be set to 1000 for real-time processing.

This is the limit of normal timing resolution.

The time-of-day clock will run slow.

10

Process Monitoring Using ps

• You need tp identify active processes before determining;

• Which process is causing a delay

• Which resource is bottlenecking the process

• The ps command enables you to check the status of processes.

• The ps command helps determine how to set process priorities.

• The BSD version, /usr/ucb/ps -aux, provides the best performance related data.

11

Sleeping

Running

Ready

Blocked Scheduled

Awakened

Scheduling States

12

Scheduling Classes

Unix provides four scheduling classes by default. These are:

TS - The timesharing class, for normal user work. Proiorities are adjusted based on CPU usage.

IA - The interactive class, derived from the timesharing class, provides better performance for the task in the active window in OpenWindows or CDE.

SYS - The system class, also called he kernel priorities, is used for system threads such as the page daemon and clock thread.

RT - The real-time class has the highest priority in the system , except for interrupt handling; it is even higher than the system class.

13

Dispatch Parameter Table Issues

For time-sharing class processes:

• Reducing time quanta favors interactive processes.

• Raising time quanta favorscompute-bound and large processes.

• Using the ts_maxwait and ts_lwait fields controls CPU starvation.

• Slightly raising the values of ts_tqexp causes the priority of compute -bound processes to drop more slowly.

• Changing the table can be done to fit your workload.

14

The dispadmin Command

• Displays or changes scheduler parameter

• Uses options:

• -l - List available schedulingclasses

• -c class - Specify the class whose parameters are to be displayed or changed

• -g - Displays configure parameters

• Provides a simple way of formatting control file

• -s file - Sets parameters from a file

15

The interactive Scheduling Class

• Is used to enhance interactive performance

• Is the default scheduling class for processes in Common Desktop Environment and Open Windows sessions

• Uses most of the time-sharing class facilities

• Boosts the priority of the task in the active window by 10 points

• Priority is reset when it is no longer the active window.

• Does not boost processes changed using nice or other commands

16

Processor Sets

Allows exclusive use of groups of processors by certain processes

Also known as CPU fencing

Is very different from the pbind(1M)

Is managed by the psrset (1M) command

Is controlled by the root user

Has system-defined processor sets which can be used by and user

Forces DR to release bindings if necessary

17

The Run Queue

A count of kernel threads waiting to run is kept.

It contains a total of all system dispatch queues.

It is not scaled by the number of CPUs.

A depth of 3-5 per CPU is usually OK

This depends on the type of work being run.

There is no way to tell what is waiting or how long it has been waiting

18

CPU Activity

User - The user process is running

System - Kernel is running.

This includes system thread time and user system calls.

Wait I/O - The CPUs are idle, but a disk device is active.

Idle - The system is waiting

Some reports add Wait I/O and Idle.

This is usually reported as Idle time.

You can check the tool's man page.

19

CPU Control and Monitoring

Processor control and information is reported by:

mpstat - Displays CPU usage statistics

psradm - Enables and disables individual CPU

psrinfo - Determines which CPUs are enabled

prtconf - Shows device configuration

prtdiag - Prints system configuration and diagnostic information

Process Manager - Shows current activity

psrset - Manages processor sets

20

What is Cache?

• Keeps accessed data near its user

• Must provide high-speed data access

• Holds a small subset of the available data

• Is used by hardware and software

• Has many different types around the system

• Is critical for performance

• Can be managed in many different ways

21

SRAM and DRAM

Hardware caches are usually SRAM.

Static RAM does not need a refresh.

DRAM data fades after about 10 cycles.

Refresh rewirtes data, but this delays access.

SRAM does not fade, so no refresh is needed.

SRAM takes four times as many transistors as DRAM.

SRAM will always be more costly and take more power.

22

CPU Caches

There are usually two levels of cache in a CPU.

Level one (internal) - Usually 4-40 Kbytes in size

Level two (external) - Usually .5-8 Mbytes in size

A level one cache operates at CPU speeds.

Data must be in the level one cache before the CPU can use it.

23

Cache Hit Rate

The performance of a cache depends on the hit rate.

The cache hit rate is how often requested data is available

The cache hit rate depends on:

Size of the cache

Fetch rate

Locality of data references

Cache structure

24

Cache Hit Rate

Misses are very expensive

For example:

Hit cost is 20 units, miss cost is 600 units.

At 100 percent hit rate, cost = 100 x 20 = 2000.

At 99 percent ht cost = (99 x 20) + (1 x 600) = 1980 + 600 = 2580.

A one percent miss cost is 580 + 2000 or 29 percent degradation

The exact numbers depend on the system.

25

CPUL1 Cache L2 CacheRegister

PeopleNetworkDiskMain

Memory

System Cache Hierarchies

26

The Memory Free Queue

• Main memory is a fully associative disk caceh managed by the O.S

• Move ins and move outs occur as in any cache.

• For memory, this is called paging.

• Moves involving disk are very expensive.

• A move out followed by a move in is too costly.• Free memory pages are kept available to avoid the need for move outs before a move in.

• You always need to have "enough" pages on the free queue.

27

The paging Mechanism

• Every quarter second a check is made to see if the amount of free memory is less than the quantity specified be the lotsfree parameter.

• If it is, the page daemon is run to replenish the free memory queue.

• Pages are stolen from their users if they have not been used recently.

• Pages are scanned to determine which can be stolen.

• The more pages needed, the faster the scan rate.

• If the page daemon cannot keep up, swapping may occur.

28

Priority Paging

Is included with the Solaris 7 OS but is disabled by default

A kernel patch is required for Solaris 2.6 and 2.5.1 OS

Is activated be setting priority_paging to 1 in /etc/system

Has a new tuning parameter, cachefree, whose default value is twice lotsfree

Starts the page daemon running at cachefree

When enabled, steals only file system pages (data files) until lotsfree is reached

Is very good for large users of random I/O

29

Swapping

Swapping is a last resort (desperation swapping). If the page daemon consistently cannot keep up with the demand formemory, memory use must be cut.

The number of swap is not important; the fact that there are swaps is.

A process will be swapped when its last LWP is swapped out

Its memory will not be freed until then.

Do not try to tune swaps, try to eliminate them.

30

"tmpfs"

It allows the creation of a virtual memory ramdisk such as /tmp

It uses virtual memory like any other user.

It uses real memory and swap space.

If used heavily, /tmp can fill main memory

You can limit the use of /tmp.

The size option in the vfstab entry is used.

You must not move it to a hard drive partision

1 Process Lifetime Parent Child. 2 Process Performance Issues " The maximum number of processes allowed is 30,000. " Executables that use shared libraries.

Documents

system class

timesharing class

interactive class

realtime class

timesharing class processes

class wh

clock routine o

o ticks