1 copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002 The Evolving Solaris Kernel The Evolving Solaris Kernel Past, Present & Future Jim Mauro Senior Staff Engineer - Performance & Availability Engineering Sun Microsystems, Inc. 400 Atrium Drive, Somerset, NJ 08812 [email protected]Richard McDougall Senior Staff Engineer - Performance & Availability Engineering Sun Microsystems, Inc. [email protected]2 copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002 The Evolving Solaris Kernel Agenda • Introduction • Solaris Overview • Distribution • Releases • System Overview & Kernel Features • 64-bits • The Evolution • Things added, things changed • Tips and tidbits along the way... • Major Features Review • Solaris 7 • Solaris 8 • Solaris 9
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
5copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Releases• Base release, followed by quarterly update releases
• Solaris 8 - released 2/00• Solaris 8, 6/00 (update 1)• Solaris 8, 10/00 (update 2)• Solaris 8, 1/01 (update 3)• Solaris 8, 4/01 (update 4)• Solaris 8, 7/01 (update 5)• Solaris 8, 10/01 (update 6)• Solaris 8, 2/02 (update 7)
• Solaris 9 - base release, May, 2002
• The model is designed to
• Provide predicatability for planning• Provide a vehicle for getting new features, functionality and
patches out in a regular and timely fashion
6copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Releases (cont)
• So, which release am I running?
sunsys> cat /etc/release Solaris 8 6/00 s28s_u1wos_08 SPARC Copyright 2000 Sun Microsystems, Inc. All Rights Reserved. Assembled 26 April 2000sunsys>
• Check out http://docs.sun.com, “What’s New” document fora specific release
7copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Kernel Features
8copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
System Overview
System Call Interface
HARDWARE
SchedulingandProcessManagement
Thread
TS/IA
RT
FX
Virtual File SystemFramework
VirtualMemorySystem
Hardware AddressTranslation (HAT)
Bus and Device Drivers
KernelServices
Clocks &TimersCallouts
UFS NFS
Networking
TCPIPSockets
SD SSD
FSS SPECFS
9copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris Kernel Features• Dynamic Kernel
• Small core unix modules• Major subsystems implemented as dynamically loadable modules
14copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris Kernel Features• 32-bit and 64-bit kernel
• 64-bit kernel required for UltraSPARC-III based systems(SunBlade, SunFire)
• 32-bit apps run just fine...
• Solaris DDI/DKI Implementation
• Device driver interfaces• Includes interfaces for dynamic attach/detach/pwr
• Rich set of standards-compliant interfaces• POSIX, UNIX International
15copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris Kernel Features• Integrated networking facilities
• TCP/IP
IPv4, IPSec, IPv6• Name services - DNS, NIS, NIS+, LDAP• NFS - defacto standard distributed file system, NFS-V2 & NFS-V3• Remote Procedure Call/External Data Representation (RPC/XDR)
facilities• Sockets, TLI, Federated Naming APIs
16copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
64-Bits
17copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
64-bit Solaris• Since Solaris 7, full 32-bit binary compatibility
• A simple directory namespace rule providing for the supportand co-existence of 32-bit binaries on a 64-bit Solaris 8system;
For every directory on the system that contains binaryobject files (executables, shared object libraries, etc), there is asparcv9 subdirectory containing the 64-bit versions
• All kernel modules must be the of the same data model; ILP32(32-bit data model) or LP64 (64-bit data model)
• 64-bit kernel required to run 64-bit apps
18copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
32 bit limits• Solaris 2.5
• Heap is limited to 2GB, malloc will fail beyond 2GB
• Solaris 2.5.1• Heap limited to 2GB by default• Can go beyond 2GB with kernel patch 103640-08+• can raise limit to 3.75G by using ulimit or rlimit() if uid=root• Do not need to be root with 103640-23+
• Solaris 2.6• Heap limited to 2GB by default• can raise limit to 3.75G by using ulimit or rlimit()
• Solaris 7 & 8• Limits are raised by default• 32 bit program can malloc 3.99GB
19copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris/SPARC V8/V9 Data Model• Defines the width of integral data types
• 32-bit Solaris - ILP32• 64-bit Solaris - LP64
’C’ data type ILP32 LP64
char 8 8
short 16 16
int 32 32
long 32 64
longlong 64 64
pointer 32 64
enum 32 32
float 32 32
double 64 64
quad 128 128
20copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
64-bit Performance• 64 Bit Virtual Address Space
• (+) Free from the 3.9GB barrier• (+) Memory map large files
• 64 Bit data types
• (+) 64 Bit Arithmetic, 64 Bit Registers• (-) Pointers/Longs require moving 8 bytes
• Typically ~5% delta• Larger cache footprint
• (-) Larger Stack
21copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
• psrset (1M) - creation and management of processorsets
• pbind (1M) - original processor bind command. Doesnot provide exclusive binding
• processor_bind (2), processor_info (2),pset_bind (2), pset_info (2), pset_creat (2),p_online (2): system calls to do thingsprogrammatically
38copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 9 Resource Management
• Tasks, Projects & Extended Accounting
• Task - A collection of processes• Project - A collection of tasks
Projects
Task Task Task
proc proc procproc proc procproc
39copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 9 Resource Management
• Tasks & Projects provide abstractions for binding togetherrelated processes, for the purpose of
• Resource management. Tasks and Projects can be bound toprocess sets, have scheduler changes applied to them, etc.
• Resource controls. Resource limits can be applied at the Project orTask level.
• Resource monitoring. Tools have been enhanced to monitorutilization at the Project or Task level.• “prstat -J” - Display statistics for processes and projects• “prstat -T” - Display statistics for processes and tasks• Extended accounting. The accounting facility had been enhanced to provide
project and task level accounting data.
40copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 9 Resource Controls
• The following resource controls are available
project.cpu-shares: Number of CPU shares (FSS) available to this project
task.max-cpu-time: Maximum CPU time available to the processes in this task (milliseconds)
task.max-lwps: Maximum number of LWPs available to the processes in this task
process.max-cpu-time: Max CPU time available to this process
process.max-file-descriptor: Max number of open files for this process
process.max-file-size: Max file size
process.max-core-size: Max core file size
process.max-data-size: Max size of the process’s data segment
process.max-stack-size: Max size of the process’s stack
41copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 9 Fair Share Scheduler
• Share based (versus priority based) process scheduling
• Designed to provide a guaranteed minimum amount of CPUresources to a specific application (project/task)
• Defining a maximum, or ceiling, not currently available• Shares are allocated to projects
• Shares are not percentages
• Shares allocated are relative to shares allocated to other projects• The total number of shares allocated also matters
• FSS can be used in conjunction with processor sets
• Finer grained management and control
42copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
FSS & Processor Sets
Project A16.66% (1/6)
Project B33.33%(2/6)
Project C50%(3/6)
Processor Set 12 CPUs25% of System
Project B40%(2/5)
Project C60%(3/5)
Processor Set 24 CPUs50% of System
Project C100%(3/3)
Processor Set 32 CPUs25% of System
43copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Resource Pools
• Provides a facility for stateful (persistent) processor sets andproject binding, as well as scheduling class assignment
• Resource pool management is done via pooladm (1M),poolbind (1M), and poolcfg (1M).
• /etc/pooladm.conf provides persistance across reboots(managed via poolcfg (1M))
• poolbind (1M) provides for binding of projects or tasks to aresource pool
• /etc/projects can define a resource pool for a project or task
44copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris Release Features Summary
45copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 7 - New Features• 64-Bits
• Kernel• 64-bit binary support• Full binary compatibility for 32-bit executables
• UFS logging
• mount -o logging• Logs to spare blocks in cylinder group• No fsck
• UFS noatime
• Disable access time update to inodes
• pgrep & pkill
• Ends ps -ef | grep proc_name | aw k ‘ { print $2 }’
46copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 7 - New Features• traceroute bundled
• dumpadm(1M)
• Configure a seperate raw partition for dumps• Dump running systems
• LDAP Client Library
• TCP with SACK
• Selective Acknowledgement - RFC 2018
• libdevinfo (3)
• Device configuration information APIs
• truss (1) Enhanced
• User level function tracing. “-u”, “-U”
47copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Solaris 8 - New Features• Cyclic Page Cache
• Enhanced VM page management functionality• Priority for page allocation given to process segments• freemem is real!
• System Message IDs
• Numeric ID generated for syslog messages
• devfsadm(1M)
• One tool for device configuration/management• DR events managed through devfsadmd
63copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Shared Memory
• System V Initimate Shared Memory (ISM)
• Shared translation data structures• 4MB TLB Page Size• Locked pages• Invoke with an additional flag to shmat () - SHARE_MMU• Default shared memory mode for Oracle RDBMS
• System V Dynamic Intimate Shared Memory (DISM)
• Solaris 8 U3• Pageable variant of ISM• Integrated with Oracle 9i (dynamic SGA)• 8k TLB Page Size for Solaris 8• 4MB TLB Page Size for Solaris 9 U1
64copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
65copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Multiple Page Size Support
• Solaris 8
• Large (4MB) pages with ISM/DISM for shared memory
• Solaris 9
• "Multiple Page Size Support"• Optional large pages for heap/stack• A wrapper for unchanged programs (ppgsz)• Programatically via memcntl(3C)• Shared library for existing binaries (LD_PRELOAD) (/usr/lib/
libmpss.so)• pmap enhancements to observe page sizes (pmap -sx)• Tool to observe potential gains (trapstat -T)
66copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
76copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Kernel Process Model• Processes
• All processes begin life as a program• All processes begin life as a disk file (ELF object)• All processes have “state” or context that defines their execution
environment - hardware & software context
• Hardware context
• The processor state, which is CPU architecture dependent.• In general, the state of the hardware registers (general registers,
privileged registers)• Maintained in the LWP
• Software context
• Address space, credentials, open files, resource limits, etc - stuffshared by all the threads in a process
• can be further divided into “hardware” context and “software”context
77copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
87copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
File system Caching• Solaris file systems use the VM system to cache
and move data
• Regular reads are page ins, delayed writes arepage outs
• VM Parameters and load dramatically effects filesystem performance
• Solaris 8 gives executable, stack and heap pages priorityover file system pages
88copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
File System Caching
Binary (Text)Binary (Data)
Stack
Heap
mmap()
STDIO
Buffers
Level 1 Page Cache
segmap page cache
(256MB on Ultra)
Level 2 Page Cache
Dynamic Page Cache
read()write() fread()
fwrite()
Buffer Cache
(BUFHWM)
Inode Cache
(ufsninode)
Directory
CacheName
(ncsize)
The cache hit ratio ofthe segmap cache canbe measured withnetstat -k segmap
File name lookups
Storage Devices
Files mapped withmmap() buypassthe segmap cache
The DNLCcache hit ratiocan be observedwith netstat -s
The buffercache hitratio can beobserved withsar -b
direct.blocks
89copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
UFS• Block based allocation
• 2TB Max file system size• A file can grow to the max file system size
• triple indirect is implemented
• Prior to 2.6, max file size is 2GB
90copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
UFS Block Allocation# filestat /home/bigfile
Inodes per cyl group: 64Inodes per block: 64Cylinder Group no: 0Cylinder Group blk: 64File System Block Size: 8192Device block size: 512Number of device blocks: 204928
93copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
Direct I/O Checklist• Must be aligned
• sector aligned (512 byte boundary)
• Must not be mapped
• Buffer must be word aligned
94copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
UFS Write Throttle• A throttle exists in UFS to limit the amount of
memory UFS can saturate, per file• Controlled by three parameters• ufs_WRITES (1 = enabled)• ufs_HW = 393216 bytes (high water mark to suspend IO)• ufs_LW = 262144 bytes (low water mark to start IO)
• Almost always need to set this higher to getmaximum sequential write performance
• set ufs_LW=4194304• set ufs_HW=67108864
95copyright (c) 2002 Jim Mauro and Richard McDougall Nov 2002
The Evolving Solaris Kernel
UFS Performance• Adjacent blocks are grouped and written together
or read ahead• Controlled by the maxcontig parameter• Defaults to 128k on most platforms, 1MB on SPARCstorage array
100,200• Must be set higher to achieve adequate write performance• maxphys must be raised beyond 128k also