© Demand Technology, Inc. Windows Server 2003: An Update Demand Technology Software 1020 Eighth Avenue South, Suite 6, Naples, FL 34102 phone: (239) 261-8945 fax: (239) 261-5456 e-mail: [email protected] http://www.demandtech.com
© Demand Technology, Inc.
Windows Server 2003: An Update
Demand Technology Software1020 Eighth Avenue South, Suite 6, Naples, FL 34102
phone: (239) 261-8945 fax: (239) 261-5456 e-mail: [email protected]
http://www.demandtech.com
2©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Outline
Windows Server 2003 Overview Support for big Iron
64-bit processors ccNUMA architecture multiprocessors
LRU-based memory management WSRM policy-based performance manager New web services application processing
architecture (IIS 6.0 Web Gardens) Assessment
3©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Big Iron support
0
10
20
30
40
50
60
70
80
90
100
10:00 10:30 11:00 11:30 12:00 12:30
perc
ent
Processor Utilization Breakdown (NT)11/19/2003 09:45 - 11/19/2003 12:51
privileged user interrupt DPC
4©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 architecture
No major architectural changes processr.sys support for processor-specific
optimizations ccNUMA support New LRU-based memory management
Identical 32 and 64-bit versions Multiple service host (svchost)
process address spaces For purposes of extended security granularity
5©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 architecture
Multiple service host (svchost) process address spaces How to identify them
C:\>tasklist /svc /fi "Modules eq srvsvc.dll"
6©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 architecture
Multiple service host (svchost) process address spaces How to identify them
Module identification
function:
7©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Executive
User
Privileged
User Process
WIN32K.SYS
VideoDriver
ntoskrnl.sys
hal.sys
processr.sys
I/O Manager
DeviceDrivers
NDISDriver
IP
TCP/UDP
NBT
FileSystemDrivers
Services(svchost.
exe)
Server/RedirectorObject
ManagerProcess & Threads
VirtualMemoryManager
SecurityLocal
ProcedureCall
SecuritySubsystem
EncryptionSubsystem
Win32Subsystem
I/O Manager stack
Network protocol stack
HARDWARE
PnP/PowerMgmt
Winlogon
ServiceController
8©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows evolution
Common NT code base now in use across all Microsoft platforms Windows XP for workstations Windows Server 2003 Windows CE Embedded Windows xBox???
Which allows .NET Framework programs to execute everywhere
9©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
.NET Framework (for applications)
Win32COM+ MSMQ AD WMI
AS
P.N
ET
Form
s
XM
L, S
OA
P
Common Language Runtime (CLR)
Enterprise Services (Clustering)ADO.NET
XML SupportOther services
.NET Languages
10©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 evolution: new in WinXP
New Processor Object Counters
Supports Intel SpeedStep processor that run at different speeds to save power
Disk prefetching
Designed to speedup the boot process and program image file loading
System virtual memory changes
Overcome a variety of “large system” deficiencies, most noticeable with large Terminal Services environments: Registry size limits, accessing very large mapped files, very large device drivers
Volume shadow copy service
Supports file Cache flush and freeze so that volume snapshots can be easily created with integrity
11©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 evolution: new in Win2003
64-bit support IA-64 and AMD-64 (SP1) support; much work remains to port all the server applications
LRU memory management
Page trimming performed based on the age of a working set page
System virtual memory changes
Generic X’ffff ffff’ setting (-1) for PagedPoolSize, NonPagedPoolSize
WSRM Policy-based performance management: dispatching priority, processing, working set
Disk Counters always on
Both Logical and Physical Disk performance counters are always enabled
12©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 evolution: new in Win2003
Processr.sys driver
Hardware-dependent logic moved to special kernel mode driver: IA-32, hyperthreading, power-management, IA-64, AMD-64
Transition Pages Re-purposed/sec
New memory management counter fills a gap in the measurement data
Kernel-mode web services cache
New IIS 6.0 architecture also features Web Gardens support for ASP and ASPX applications
TCP/IP v6 support TCP/IP v4 and v6 can run side by side
Security enhancements
13©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Beyond Windows 2003
Longhorn Preview Production versions no slated to ship until 2006
or 2007 Workstation-oriented UI changes Designed around 64-bit machine requirements New WinFS file system
Comprehensive “flat” view, alongside traditional tree view
New performance monitoring interface gradually will replace Performance Library DLLs
14©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows Server 2003 Extended Processor Support
Hardware Abstraction Layer (HAL) Process & Thread Context switching Interrupt processing Synchronization primitives SMP processor signaling Virtual memory addressing
Processr.sys CPU-specific optimizations, e.g.,
ccNUMA Hyperthreaded Processors Power-conserving portables
Ntoskrnl.exe
HALProcessr.sys
15©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows Server 2003 Extended Processor Support
32-bit Intel Currently @ 3 GHz Power-conserving Pentium 4 M Both conventional & HT multiprocessors New, improved measurement interface
Support is currently available only in Intel vTune
64-bit Intel ccNUMA support
AMD-64 Expected in Service Pack 1
16©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Intel 686 microarchitecture
Increased execution parallelism; remove some of the instruction sequence
dependencies Instructions are translated into RISC
micro-instructions Pool of 40 GP pseudo-Registers Micro-ops can be executed out of
sequence Executed instructions are retired
17©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Intel 686 micro-
architecture
(single threaded,
superscalar)
Instruction Fetch
Decoded Instruction QueueRegister
AllocationTable
ExecutionTraceQueue
Instruction Decode
Micro-operationReservation Station
Reorder Buffer
Execution Unit
Execution Unit
Execution Unit
Execution Unit
Execution Unit
InO
rder
Ret
irem
ent
TranslationLookaside
Buffer
InstructionCache Data
Cache
Ret
ired
Inst
ruct
ion
s
18©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Pentium 4 Hyperthreading
Two instruction streams scheduled to execute concurrently in the same pipeline.
When one instruction stream stalls, instructions from the other thread can be executed.
Implementation: External interface is replicated Internal resources are
partitioned and/or shared
Instruction Fetch
Decoded Instruction QueueRegister
AllocationTable
ExecutionTraceQueue
Instruction Decode
Micro-operationReservation Station
Reorder Buffer
Execution Unit
Execution Unit
Execution Unit
Execution Unit
Execution Unit
InO
rder
Ret
irem
ent
TranslationLookaside
Buffer
InstructionCache Data
Cache
Ret
ired
Inst
ruct
ion
s
CPU 0 CPU 1
CPU 0 CPU 1
CP
U 0
CP
U 1
CPU 0 CPU 1
19©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Partitioned
Pentium 4 Hyperthreading
Inst
ruct
ion
Fet
ch
RegisterAllocation
Table
ExecutionTraceQueue
Inst
ruct
ion
Dec
ode
Reo
rder
Bu
ffer
Exe
cuti
on
Un
it
Exe
cuti
on
Un
it
Exe
cuti
on
Un
it
Exe
cuti
on
Un
it
Exe
cuti
on
Un
it
TranslationLookaside
Buffer
InstructionCache
DataCache
RetiredInstructions
CP
U 0
CP
U 1
CP
U 0
CP
U 1
CPU 0 CPU 1
CP
U 0
CP
U 1
InOrderRetirement
Mic
ro-o
pera
tion
Res
erva
tion
Sta
tion
Dec
oded
Inst
ruct
ion
Que
ue
Duplicated
20©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Pentium IV Hyperthreading
Two simultaneously executing threads can saturate internal execution pipeline resources!
Windows 2003 support A new Win32 GetLogicalProcessorInformation API call
retrieves information about logical processors and related hardware.
Scheduler support: spread the load across physical processors first, then schedule logical processors when physical processors are all “busy”
HALT the processor in Idle mode processr.sys function
21©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Intel 786 IA-64 architecture
Itanium-2: 1.6 GHz and higher EPIC: Explicitly Parallel Instruction Execution
Explicit parallelism (up to 3 instructions in a bundle) Predication Speculation Massive Resources
Extended Register sets
New instruction set! Requires compilers that can take advantage
of the architecture
22©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
AMD-64
AMD’s 64-bit extension to IA-32 8 additional GPRs in
Long Mode Microarchitecture
similar to the IA-32 Native IA-32
instruction execution in Short Mode
Instruction Fetch
8 EntryScheduler
TranslationLookaside
Buffer
InstructionCache
Decode 1
Decode 2
Decode 1
Decode 2
Decode 1
Decode 2
Pack
Decode
Pack
Decode
Pack
Decode
8 EntryScheduler
8 EntryScheduler
36 EntryScheduler
ALU AGU ALU AGU ALU AGU FADD FMUL FMISC
Pick
BranchPrediction
Level 1 Data Cache
Dispatch
23©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Symmetric Multiprocessing (SMP)
RAM
Processor
Cache
Processor
Cache
Processor
Cache
Processor
Cache
Memory BusPeripheral Bus
24©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessing scalability
Queued spin lock support During spin lock execution, micro-ops are
discharged into the instruction execution pipeline faster than they can be executed
Leads to resource shortages When the spin lock test finally succeeds,
large portions of the pipeline have to be flushed
25©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessing scalability
Queued spin lock support Uses PAUSE instruction
Slows down loop execution slightly to a rate that is synchronized to memory bus access so that
Allows the processor to detect immediately a change in the value of the loop synchronization variable
KeAcquireInStackQueuedSpinLockAtDpcLevel, KeReleaseInStackQueuedSpinLockFromDpcLevel
26©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
A ctual vs. Ideal Mult iprocessor Scalability
16,263
24,925
30 ,231
57,015
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
0 1 2 3 4 5 6 7 8
# of processors
TP
C-C
Tra
nsa
ctio
ns/
sec
Vendor AIdealVendor BIdealLog. (Vendor A )
27©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
A ctual vs. Ideal Mult iprocessor Scalability
16,26324,925
30 ,231
57,015
0
30,000
60,000
90,000
120,000
150,000
0 4 8 12 16 20 24 28 32
# of processors
TP
C-C
Tra
nsa
ctio
ns/
sec
Vendor AIdealNonlinear t rendline
28©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
ccNUMA Support
Cache-coherent Non-uniform Memory Access Consist of Multiprocessor nodes
Processors (usually 4) Local memory (shared memory bus) Memory controller (for remote access)
Introduces another level of cache coherence Memory accesses are non-uniform when comparing Next Level
Cache hits to main memory references Remote memory latency is 3-5 times slower than local memory
access Overcomes the congestion problems of a single bus
architecture if there is sufficient node affinity in the workload
29©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
ccNUMA Support
Threads scheduled on their ideal node
Real memory allocated locally, whenever possible Maintains multiple
Pools & Available Bytes queues
Node 0
CPU 0
Cache
CPU 2
Cache
CPU 1
Cache
CPU 3
Cache
Node 2
CPU 0
Cache
CPU 2
Cache
CPU 1
Cache
CPU 3
Cache
Lo
cal M
emo
ry
Node 1
CPU 0
Cache
CPU 2
Cache
CPU 1
Cache
CPU 3
Cache
Node 3
CPU 0
Cache
CPU 2
Cache
CPU 1
Cache
CPU 3
Cache
Lo
cal M
emo
ry
Remote Memory
Controller
Cro
ssba
rS
witc
h
30©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
ccNUMA Support
Traditionally, the drawback to NUMA was that applications had to be re-coded to take full advantage of the architecture
Performance is sensitive to the long latency associated with remote memory access
API Function Description
GetNumaAvailableMemoryNode
Retrieves the amount of memory available in the specified node.
GetNumaHighestNodeNumber
Retrieves the node that currently has the highest number.
GetNumaNodeProcessorMask
Retrieves the processor mask for the specified node.
GetNumaProcessorNode Retrieves the node number for the specified processor.
31©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
May be the only way to optimize large, n-way multiprocessors e.g., HP Superdome, Unisys ES7000 16-64
processor machines
But it requires commitment! Understanding your current workload CPU
processing requirements Continuous monitoring of the workload on a per
processor basis Ensure that excessive numbers of threads from
“loved ones” are not in the Ready Queue Periodic review of the partitioning scheme
Multiprocessor partitioning
32©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
The workload must be concentrated enough on its dedicated CPUs so that it will benefit from cache warm starts, but not too concentrated that it causes excessive processor queuing.
Multiprocessor partitioning
10 0
90
80
70
60
50
4 0
30
20
10
0CP U 1 CP U 2 CP U n
33©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
The interrupt workload should not be so concentrated on its dedicated CPUs that interrupt processing is subject to excessive interrupt pending time delays
Between 5-30% concentration for % Interrupt Time is probably ideal
Multiprocessor partitioning
10 0
90
80
70
60
50
4 0
30
20
10
0CP U 1 CP U 2 CP U n
34©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessor partitioning
Reskit Interrupt Filter tool:
35©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessor partitioning
WSRM policies:
36©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessor partitioning
WSRM policies: Caution:
Not designed to be used with any application that performs its own partitioning
If an app sets a Processor Affinity mask, WSRM will honor it!
37©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Multiprocessor partitioning
38©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Win64 Virtual Memory
Architectural component 64 bit 32 bit
Virtual Memory 16TB 4GB
Paging File Size 512 TB 64 GB
Hyperspace 8 GB 4 MB
Paged Pool 128 GB 470 MB
Non-paged Pool 128 GB 256 MB
System cache 1 TB 1 GB
System PTE (page table entries) 128 GB 660 MB
39©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Page replacement
Windows 2003 uses a form of the popular LRU page replacement algorithm Pages in process working sets are aged using the
access bits maintained by the hardware Older pages without their access bits set are
“trimmed” first Recently trimmed pages are retained in the Standby
list (aka, the page cache) and returned to the process working set via transition faults.
Eventually, older trimmed pages on the Standby List are “re-purposed” and place on the Free List
40©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Page replacement
FreedPages
TransitionFaults/sec Standby
List
FreeList
ZeroList
TrimmedDirty
PagesTrimmedPages
ModifiedList
(Dirty pages)
Modified Page Writer(Pages Output/sec)
Re-purposedTransition Pages
Zero PageThread
Demand ZeroPage Faults/sec
Process Working sets (Working set bytes)
Available Bytes
Working set Page Aging (LRU)
PagesInput/sec
41©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Page replacement
Age of a Page Most Recent: access bit is set Older: Older access bit set last page trimming scan Even older: Not accessed at the time of the last scan Oldest: Recently trimmed page marked in “transition”
that are kept in the Standby List (page cache)
Page trimming working set scans Threshold-driven: the actual rate is not reported Efficiency is very important! Only enough pages to replenish the Standby List are
trimmed
42©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Page replacement
Available Bytes = (free + zero + standby) Windows Server 2003 utilizes RAM much
more efficiently than previous versions of the OS Previously, the only way pages were aged was using
the transition fault mechanism
Applications which attempt to allocate as much real memory as they can get require special consideration MS Exchange MS SQL Server
43©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Page replacement
44©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
SQL Server memory tuning
For processes that perform their own working set management: CreateMemoryResourceNotificati
on to create a memory resource notification object
Sends two events to “listening” processes:
LowMemoryResourceNotification
HighMemoryResourceNotification
45©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
IIS memory tuning
IIS 6.0 memory caching Kernel cache (new)
http Responses
System File Cache .htm, .jpg, .gif files, etc.
IIS Object cache File handles
Active Server Pages Script engines
Templates
Kernel Inetinfo.exe
w3wp.exe
HTTP.SYS
http GET Request
TCP/IPStack
Response Object Cache
File Cache
ASPASPX
Request
Thread Pool
Thread Pool
Web Gardens
46©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
IIS memory tuning
Monitoring IIS 6.0 memory caching Kernel cache (new)
Web Server Cache object
System File Cache Cache object (MDL interface)
IIS Object cache IIS Global object
Active Server Pages Active Server Pages
47©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
IIS memory tuning
Tuning parameters for controlling IIS memory caching: MaxCachedFileSize – defaults to 256 KB MemCacheSize – defaults to dynamic sizing ObjectCacheTTL - default is 30 seconds
objects include File handles, Directories .htm, .gif, and .jpg files are cached in the system
file cache
OpenFilesInCache - default is 1000 per 32 MB (obsolete)
48©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
IIS ASP memory tuning
Tuning parameters for controlling IIS Active Server Pages memory caching: AspTemplateCache – defaults to 250 files in IIS 5.0
AspScriptEngineCacheMax – defaults to 120 Allow each processing Thread to cache its script engine
AspScriptFileCacheSize – defaults to 500 Allow each processing Thread to cache its precompiled
script engine
Monitor Active Server Pages: ASP Templates Cached, Template Cache Hit Rate, Free Script Engines in Cache
49©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
New Win 2003 tools
Resource Kit tools are now supported! Consume Kernrate Poolmon
Windows NT kernel Measurement Interface (Trace) reports
50©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Win 2003 performance monitoring
NTSMF support Configuration file
Analogous to Disable Performance Counters Registry flag Use it with Perflib DLLs that have persistent problems and
pernicious side-effects– .NET Framework perflibs– Lotus Notes Perflib
Logical & Physical Disk counters are always on!
IPv6 and TCPv6 support Discourage use of WMI interface
51©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Priority Queuing/Preemptive Scheduling
Windows timer services Internal processor clock (MHz) is not accessible External clock hardware generates timer interrupts
Clock interrupt every “quantum” GetSystemTimeAdjustment API call
Virtual system clock 100 nanosecond units, SetTimer API Ticks occurs every 10 ms.
High precision timer APIs to access external clock directly
QueryPerformanceCounter and QueryPerformanceFrequency
52©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Processor performance monitoring
Each clock interval, the clock external interval routine determines what thread was executing. CPU Time is sampled and then accumulated in a
Counter at the Thread and Process level Several hundred samples are gathered every second How accurate is this sampling?
A System Idle Thread is dispatched when there are no Ready Threads Processor Busy = 100% - Idle Thread CPU Time Not a true thread: processr.sys hardware-dependent
implementation
53©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
What is the relationship between processor utilization and queuing?
“Interpreting NT Queue Length Measurements,” by Yiping, Bolker & Yefim, BMC Software, CMG 2002 Research questions the validity of the
System:Processor Queue Length Counter At low utilization, the Processor Queue
Length can be a relatively higher number than queuing theory predicts
54©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Interpreting NT Queue Length Measurements
Validity of the System:Processor Queue Length Counter At low utilization, the observed Processor Queue Length can
be a relatively high number Instantaneous Counter (i.e., sampled value) Heisenberg Uncertainty Principle: measurement tools can get
in the way of the measurement datae.g., dmperfss collection thread always observed in the Running state Worst case results if you use a 1 second sample interval due to:
– single Timer queue – HAL virtualization of the system clock
55©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Lab exercise. View the
STPSQL02PD.200311060200.sum.smf data file in PerfGal
Determine the relationship between Processor utilization and Processor queuing
Can you make a case for a CPU upgrade?
56©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
57©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
58©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
59©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Plus, examine the Processes in the Ready state to determine if your Loved Ones are impacted…
60©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
61©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Support & Resource Kit Tools
Required Documentation Trace collection and formatting tools
(logman, etc.) Debugger Interrupt Affinity Filter Tasklist Kernrate pfmon poolmon NT 4.0 Performance Monitor
62©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003 Time Service
Allows you to synchronize your servers to a common machine or to an authoritative external time source using SNTP e.g.,
net time /setsntp:”tock.usno.navy.mil ntp2.usno.navy.mil”net stop w32timew32tm –once
net start w32time
See KB Q216734 for more info.
63©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Resource Kit tools
Miscellaneous measurement tools Pfmon.exe - page fault monitor Consume.exe - CPU & memory stress program Kernrate.exe - detailed CPU profiler Vadump.exe - process virtual address dump
Miscellaneous Counter tools Documentation
Counters.chm Regentry.chm
64©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Kernrate profile
High frequency sampler in the Reskit Records processor utilization by process
virtual address includes instructions executing inside the operating
system kernel, the HAL, device drivers, etc.
e.g.kernrate -n smlogsvc -s 120 -k 100 -v 2 -a -e -z
pdh
See kernrate.doc for more information
65©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Lab exercise. Start Sysmon log
manager gathering process and thread data to a .csv file
Monitor CPU usage using kernrate
Hint: prepare kernrate cmd in advance!
66©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Start smlogsvc
67©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
kernrate
Starting to collect profile dataWill collect profile data for 120 seconds===> Finished Collecting Data, Starting to Process Results------------Overall Summary:--------------P0 K 0:00:03.094 ( 2.6%) U 0:01:19.324 (66.1%) I
0:00:37.584 (31.3%) DPC 0:00:00.200 ( 0.2%) Interrupt 0:00:00.991 ( 0.8%) Interrupts= 190397, Interrupt Rate= 1587/sec.
68©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
kernrate
Results for Kernel Mode:-----------------------------OutputResults: KernelModuleCount = 619Percentage in the following table is based on the Total Hits for the KernelTime 20585 hits, 19531 events per hit --------Kernel CPU Usage (including idle process) based on the profile interrupt total possible hits
is 33.50% Module Hits Shared msec %Total %Certain Events/Secntoskrnl 15446 0 120002 75 % 75 % 2513923processr 4381 0 120002 21 % 21 % 713032hal 524 0 120002 2 % 2 % 85283win32k 114 0 120002 0 % 0 % 18554Ntfs 42 0 120002 0 % 0 % 6835nv4_disp 22 0 120002 0 % 0 % 3580nv4_mini 12 0 120002 0 % 0 % 1953
69©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
kernrate
Results for User Mode Process SMLOGSVC.EXE (PID = 2120)OutputResults: ProcessModuleCount (Including Managed-Code JITs) = 41Percentage in the following table is based on the Total Hits for this ProcessTime 40440 hits, 19531 events per hit --------User-Mode CPU Usage for this Process based on the profile interrupt total possible hits is
65.82% Module Hits Shared msec %Total %Certain Events/Secpdh 31531 0 120002 77 % 77 % 5131847kernel32 6836 0 120002 16 % 16 % 1112597msvcrt 1700 0 120002 4 % 4 % 276684ntdll 362 0 120002 0 % 0 % 58917
70©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
kernrate
===> Processing Zoomed Module pdh.dll...----- Zoomed module pdh.dll (Bucket size = 16 bytes, Rounding Down) --------Percentage in the following table is based on the Total Hits for this Zoom ModuleTime 31671 hits, 19531 events per hit -------- (33371 total hits from summing-up the module
components)(51.55% of Total Possible Hits based on the profile interrupt) Module Hits Shared msec %Total %Certain Events/SecStringLengthWorkerA 23334 1 119992 69 % 69 % 3798056IsMatchingInstance 1513 0 119992 4 % 4 % 246269GetInstanceByName 1374 0 119992 4 % 4 % 223644IsMatchingInstance 1332 0 119992 3 % 3 % 216808NextInstance 1305 12 119992 3 % 3 % 212413GetInstanceName 829 0 119992 2 % 2 % 134935PdhiHeapFree 495 428 119992 1 % 0 % 80570GetInstance 469 11 119992 1 % 1 % 76338PdhiHeapReAlloc 428 0 119992 1 % 1 % 69665GetCounterDataPtr 338 338 119992 1 % 0 % 55015NextCounter 246 1 119992 0 % 0 % 40041GetInstanceByName 233 229 119992 0 % 0 % 37925FirstInstance 121 121 119992 0 % 0 % 19695PdhiHeapFree 121 0 119992 0 % 0 % 19695PdhiMakePerfPrimaryLangId 108 108 119992 0 % 0 % 17579
71©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
CIMOM CIMOM ((WinMgmt))
Management applicationManagement application
COM interfacesCOM interfaces
ProviderProvider(DLL)(DLL)
ProviderProvider(EXE)(EXE)
CIMCIM repositoryrepository
WMI architecture
ProviderProvider(DLL)(DLL)
ProviderProvider(EXE)(EXE)ProviderProvider(EXE)(EXE)
ProviderProvider(DLL)(DLL)
Management applicationManagement application
72©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
PerformancePerformanceExtensionExtension
DLLDLL
SystemSystemPerformancePerformance
DLLDLL
Windows NTPerformance
Monitor
System Monitor
graph control
Custom Performance Tool
RegQueryValueEx()
PerflibWMI
PDH.DLL
Hi-Perf Data Provider Object
Sysmon Log Service
Files
Sysmon log and alert service
SystemSystemPerformancePerformance
DLLDLLSystem
PerformanceDLL
PerformancePerformanceExtensionExtension
DLLDLL
System Monitor architecture
PerformanceExtension
DLL
73©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
Secure kernel trace facility Architected for high volume
NT Scheduler trace abandoned due to overhead
Designed for application extensions But only Microsoft understands the format
of the Trace data well enough to create reporting applications.
74©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Operating System Traces
Trace Header Process IdParent
Process IdSecurity Id Image Name
Process Start/End
4 bytes 4 bytes Variable length Variable length
Trace Header DiskSignature
TransferSize
Disk I/O
4 bytes 4 bytes
IRP Flags
4 bytes
Trace Header
Page Faults
Response
4 bytes
VirtualAddress
4 bytes
Byte Offset
8 bytes
Trace Header Source IpAddress
Dest IpAddress
TCP/IP
4 bytes 4 bytes
SourcePort
2 bytes
DestPort
2 bytes
Size
4 bytes
Process Id
4 bytes
75©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel event trace
Event trace headerField Description Data Type Size
Size Event record size unsigned long 4 bytesTimeStamp Wall clock time (in 100ns) quad integer 8 bytesThreadId Thread responsible for event handle 8 bytesGUID Globally Unique Identifier structure 16 bytesUserTime User-mode CPU time (ticks) unsigned long 4 bytesKernelTime Kernel-mode CPU time unsigned long 4 bytesVersion Version of record unsigned short 2 bytesType Event type unsigned char 1 byteLevel Event level unsigned char 1 byte
76©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
Resource Kit tools: Tracelog.exe Tracedmp.exe Reducer.exe
Third party tools ???
77©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Lab exercise
View the Workload.txt Reducer sample
78©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
Reducer:
| WINDOWS 2000 Capacity Planning Trace || Version : 2128 || Type : Default |+-----------------------------------------------------------------------------------------------------------------------------------+| || Build : 2195 || Processors: 1 || Start Time: 24 Aug 2001 12:36:06.830 |______________________________________________________________________| || End Time : 24 Aug 2001 12:37:02.786 || Duration : 55 Sec || || Trace Name: NT Kernel Logger || File Name : C:\LogFile.Etl || Start Time: 24 Aug 2001 12:36:06.830 |______________________________________________________________________| || End Time : 24 Aug 2001 12:37:02.786 || Duration : 55 Sec || |+-----------------------------------------------------------------------------------------------------------------------------------+
79©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
Windows Event Trace Session Report---------------------------------------------------------- Window Build: 3700 Computer: IISPERFSRV Processors: 8 CPU Speed: 700 MHz Memory: 2048 Mb Trace Name: NT Kernel Logger File: S:\tools\etw\krnl.etl Start Time: Wednesday, October 23, 2002 9:23:53 AM End Time: Wednesday, October 23, 2002 9:24:13 AM Trace Name: IIS Trace File: S:\tools\etw\iis.etl Start Time: Wednesday, October 23, 2002 9:23:58 AM End Time: Wednesday, October 23, 2002 9:24:11 AM
80©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Lab exercise
View the iisdata.xml Report sample
81©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
82©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows kernel trace facility
Evaluation: Process End event does provide process end
performance statistics (CPU, I/O, etc.) New per Process Disk I/O Counters satisfy
most capacity planning requirements The I/O trace namespace for capturing I/O
requests by file is formidable captures per process “hard” page faults TCP trace data vs. Network Monitor data
83©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows Scheduler
WSRM controls Establish a workload’s
processor utilization target
WSRM will raise and lower process Dispatching Priority based on the management policy
Controls are effective only when there is processor contention!
84©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
MS Internet Information Server
Kernel Inetinfo.exe
w3wp.exe
HTTP.SYS
http GET Request
TCP/IPStack
Response Object Cache
File Cache
ASPASPX
Request
Thread Pool
Thread Pool
Web Gardens
85©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Windows 2003: Assessment
Rapidly becoming a corporate de facto standard for mission critical application servers Wintel application servers provide industry best
price/performance Ample, but less than perfect measurement data
Plenty of resource utilization measures Precious few service level measures
Performance Monitor API provides strong foundation
Extended Counter support provided chiefly by Microsoft, but others are getting into the act: Oracle, Veritas, etc.
86©Demand Technology, Inc. Windows 2003 Performance and Tuning: Introduction
Where to get more information
“Inside Windows 2000” by David Solomon and Mark Russinovich
Windows 2000 Professional Resource Kit
TechNet or the Microsoft Developer Network (MSDN) CD
“Microsoft Windows NT Resource Kit, Volume 4: Optimizing Windows NT”
by Russ Blake