Top Banner
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Po Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Po lze lze Unit OS B: Comparing the Unit OS B: Comparing the Linux and Windows Kernels Linux and Windows Kernels
50

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

Dec 14, 2015

Download

Documents

Nickolas Deason
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

Unit OS B: Comparing the Linux Unit OS B: Comparing the Linux and Windows Kernels and Windows Kernels

Page 2: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

2

Copyright NoticeCopyright Notice© 2000-2005 David A. Solomon and Mark Russinovich© 2000-2005 David A. Solomon and Mark Russinovich

These materials are part of the These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E. developed by David A. Solomon and Mark E. Russinovich with Andreas PolzeRussinovich with Andreas Polze

Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)environments (and not for commercial use)

Page 3: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

3

Roadmap for Section BRoadmap for Section B

A Brief History of Windows and LinuxA Brief History of Windows and Linux

Comparing the Windows and Linux kernel Comparing the Windows and Linux kernel architecturesarchitectures

Linux: becoming more like WindowsLinux: becoming more like Windows

Benchmarks and other liesBenchmarks and other lies

What does the future hold?What does the future hold?

Page 4: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

4

Scope Scope

We’re going to look at the technology of the We’re going to look at the technology of the kernelskernels

We’re not going to look at:We’re not going to look at:

CostCost

SupportSupport

ApplicationsApplications

ManagementManagement

Use as a desktop systemUse as a desktop system

Page 5: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

5

The History of LinuxThe History of Linux

The real history of Linux starts in 1969, when Ken The real history of Linux starts in 1969, when Ken Thompson developed the first version of UNIX at Bell Thompson developed the first version of UNIX at Bell Labs Labs

After Dennis Ritchie, designer of the C programming language, After Dennis Ritchie, designer of the C programming language, joined the project it debuted to the research community in an joined the project it debuted to the research community in an academic paper in 1974academic paper in 1974

Bell Labs released the first commercial version in 1976 as UNIX Bell Labs released the first commercial version in 1976 as UNIX Version 6 (V6)Version 6 (V6)

UNIX spread throughout universities and in 1978 Bell UNIX spread throughout universities and in 1978 Bell Labs released UNIX Time-Sharing System, a version with Labs released UNIX Time-Sharing System, a version with portability in mindportability in mind

Page 6: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

6

Linux History ContinuedLinux History Continued

Because Bell Labs distributed UNIX with source code, the Because Bell Labs distributed UNIX with source code, the early 1980’s saw three major branches grow on the UNIX early 1980’s saw three major branches grow on the UNIX tree:tree:

UNIX System III from Bell Lab’s UNIX Support Group (USG)UNIX System III from Bell Lab’s UNIX Support Group (USG)

UNIX Berkeley Source Distribution (BSD) from the University of UNIX Berkeley Source Distribution (BSD) from the University of California at BerkeleyCalifornia at Berkeley

Microsoft’s XENIXMicrosoft’s XENIX

The UNIX market fragmented further in the 1980’s, The UNIX market fragmented further in the 1980’s, despite the IEEE’s POSIX standard and the X/Open despite the IEEE’s POSIX standard and the X/Open Group’s Portability GuideGroup’s Portability Guide

Page 7: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

7

Linus and LinuxLinus and Linux

In 1991 Linus Torvalds took a college computer science In 1991 Linus Torvalds took a college computer science course that used the Minix operating systemcourse that used the Minix operating system

Minix is a “toy” UNIX-like OS written by Andrew Tanenbaum as a Minix is a “toy” UNIX-like OS written by Andrew Tanenbaum as a learning workbenchlearning workbench

Linus wanted to make MINIX more usable, but Tanenbaum Linus wanted to make MINIX more usable, but Tanenbaum wanted to keep it ultra-simplewanted to keep it ultra-simple

Linus went in his own direction and began working on Linus went in his own direction and began working on LinuxLinux

In October 1991 he announced Linux v0.02In October 1991 he announced Linux v0.02

In March 1994 he released Linux v1.0 In March 1994 he released Linux v1.0

Page 8: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

8

The History of Windows (NT)The History of Windows (NT)

The history of Windows really begins in the mid-1970s, The history of Windows really begins in the mid-1970s, when Dick Hustvedt, Peter Lipman and David Cutler when Dick Hustvedt, Peter Lipman and David Cutler designed the VMS operating system for Digital’s 32-bit designed the VMS operating system for Digital’s 32-bit VAX processorVAX processor

Digital shipped VMS v1.0 in 1978Digital shipped VMS v1.0 in 1978

Cutler moved to Seattle to open DECWest and worked on Cutler moved to Seattle to open DECWest and worked on the Digital Mica OS for a new CPU codenamed Prismthe Digital Mica OS for a new CPU codenamed Prism

12 engineers went with him and the facility grew to 20012 engineers went with him and the facility grew to 200

In 1988 Digital cancelled the projectIn 1988 Digital cancelled the project

Page 9: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

9

The History of Windows ContinuedThe History of Windows Continued

Bill Gates wanted a UNIX rivalBill Gates wanted a UNIX rival

He hired Cutler and 20 Digital engineers in 1989He hired Cutler and 20 Digital engineers in 1989

The new project was called NT OS/2 because it focused on OS/2 The new project was called NT OS/2 because it focused on OS/2 backward compatibilitybackward compatibility

With the success of Windows 3.0’s 1990 release Gates With the success of Windows 3.0’s 1990 release Gates refocused the project on Windows compatibilityrefocused the project on Windows compatibility

The project renamed to Windows NTThe project renamed to Windows NT

Microsoft released Windows NT 3.1 in August 1993Microsoft released Windows NT 3.1 in August 1993

Page 10: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

10

Windows and LinuxWindows and Linux

Both Linux and Windows are based on Both Linux and Windows are based on foundations developed in the mid-1970sfoundations developed in the mid-1970s

1970 1980 1990 2000

UNIX b

orn

UNIX p

ublic

UNIX V

6

Linu

x v1

.0v2

.0v2

.1

v2.2

v2.3

v2.4

v2.6

1970 1980 1990 2000

VMS v

1.0

Win

dows

NT 3.1

NT 4

.0W

indo

ws 20

00

Win

dows

XPSer

ver 2

003

Page 11: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

11

Comparing the ArchitecturesComparing the Architectures

Both Linux and Windows are monolithicBoth Linux and Windows are monolithic

All core operating system services run in a shared address space All core operating system services run in a shared address space in kernel-modein kernel-mode

All core operating system services are part of a single moduleAll core operating system services are part of a single module

Linux: vmlinuz Linux: vmlinuz

Windows: ntoskrnl.exeWindows: ntoskrnl.exe

Windowing is handled differently:Windowing is handled differently:

Windows has a kernel-mode Windowing subsystemWindows has a kernel-mode Windowing subsystem

Linux has a user-mode X-Windowing systemLinux has a user-mode X-Windowing system

Page 12: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

12

Kernel ArchitecturesKernel Architectures

Device Drivers

Process Management, Memory Management, I/O Management, etc.

X-Windows

Application

System Services

User ModeKernel Mode

Hardware Dependent Code

Linux

Device Drivers

Process Management, Memory Management, I/O Management, etc.

Win32Windowing

Application

System Services

User ModeKernel Mode

Hardware Dependent Code

Windows

Page 13: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

13

Linux KernelLinux Kernel

Linux is a monolithic but modular systemLinux is a monolithic but modular system

All kernel subsystems form a single piece of code with no All kernel subsystems form a single piece of code with no protection between themprotection between them

Modularity is supported in two ways:Modularity is supported in two ways:

Compile-time optionsCompile-time options

Most kernel components can be built as a dynamically Most kernel components can be built as a dynamically loadable kernel module (DLKM)loadable kernel module (DLKM)

DLKMsDLKMs

Built separately from the main kernel Built separately from the main kernel

Loaded into the kernel at runtime and on demand (infrequently Loaded into the kernel at runtime and on demand (infrequently used components take up kernel memory only when needed)used components take up kernel memory only when needed)

Kernel modules can be upgraded incrementallyKernel modules can be upgraded incrementally

Support for minimal kernels that automatically adapt to the Support for minimal kernels that automatically adapt to the machine and load only those kernel components that are usedmachine and load only those kernel components that are used

Page 14: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

14

Windows KernelWindows Kernel

Windows is a monolithic but modular systemWindows is a monolithic but modular system

No protection among pieces of kernel code and driversNo protection among pieces of kernel code and drivers

Support for Modularity is somewhat weak:Support for Modularity is somewhat weak:

Windows Drivers allow for dynamic extension of kernel Windows Drivers allow for dynamic extension of kernel functionalityfunctionality

Windows XP Embedded has special tools / packaging rules that Windows XP Embedded has special tools / packaging rules that allow coarse-grained configuration of the OSallow coarse-grained configuration of the OS

Windows Drivers are dynamically loadable kernel modulesWindows Drivers are dynamically loadable kernel modules

Significant amount of code run as drivers (including network Significant amount of code run as drivers (including network stacks such as TCP/IP and many services)stacks such as TCP/IP and many services)

Built independently from the kernelBuilt independently from the kernel

Can be loaded on-demandCan be loaded on-demand

Dependencies among drivers can be specifiedDependencies among drivers can be specified

Page 15: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

15

Comparing PortabilityComparing Portability

Both Linux and Windows kernels are portableBoth Linux and Windows kernels are portableMainly written in CMainly written in C

Have been ported to a range of processor architecturesHave been ported to a range of processor architectures

WindowsWindowsi486, MIPS, PowerPC, Alpha, IA-64, x86-64i486, MIPS, PowerPC, Alpha, IA-64, x86-64

Only x86-64 and IA-64 currently supportedOnly x86-64 and IA-64 currently supported

> 64MB memory required> 64MB memory required

LinuxLinuxAlpha, ARM, ARM26, CRIS, H8300, i386, IA-64, M68000, Alpha, ARM, ARM26, CRIS, H8300, i386, IA-64, M68000, MIPS, PA-RISC, PowerPC, S/390, SuperH, SPARC, VAX, MIPS, PA-RISC, PowerPC, S/390, SuperH, SPARC, VAX, v850, x86-64v850, x86-64

DLKMs allow for minimal kernels for microcontrollersDLKMs allow for minimal kernels for microcontrollers

> 4MB memory required> 4MB memory required

Page 16: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

16

Comparing Layering, APIs, ComplexityComparing Layering, APIs, Complexity

WindowsWindows

Kernel exports about 250 system calls (accessed via ntdll.dll)Kernel exports about 250 system calls (accessed via ntdll.dll)

Layered Windows/POSIX subsystems Layered Windows/POSIX subsystems

Rich Windows API (17 500 functions on top of native APIs)Rich Windows API (17 500 functions on top of native APIs)

LinuxLinux

Kernel supports about 200 different system callsKernel supports about 200 different system calls

Layered BSD, Unix Sys V, POSIX shared system librariesLayered BSD, Unix Sys V, POSIX shared system libraries

Compact APIs (1742 functions in Single Unix Specification Compact APIs (1742 functions in Single Unix Specification Version 3; not including X Window APIs)Version 3; not including X Window APIs)

Page 17: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

17

Comparing ArchitecturesComparing Architectures

Processes and schedulingProcesses and scheduling

SMP supportSMP support

Memory managementMemory management

I/OI/O

File CachingFile Caching

Security Security

Page 18: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

18

Process ManagementProcess ManagementWindowsWindows

ProcessProcess

Address space, handle Address space, handle table, statistics and at least table, statistics and at least one threadone thread

No inherent parent/child No inherent parent/child relationshiprelationship

ThreadsThreads

Basic scheduling unitBasic scheduling unit

Fibers - cooperative user-Fibers - cooperative user-mode threadsmode threads

LinuxLinux

Process is called a TaskProcess is called a Task

Basic Address space, Basic Address space, handle table, statisticshandle table, statistics

Parent/child relationshipParent/child relationship

Basic scheduling unitBasic scheduling unit

ThreadsThreads

No threads per-seNo threads per-se

Tasks can act like Windows Tasks can act like Windows threads by sharing handle threads by sharing handle table, PID and address table, PID and address spacespace

PThreads – cooperative PThreads – cooperative user-mode threadsuser-mode threads

Page 19: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

19

Scheduling PrioritiesScheduling PrioritiesWindowsWindows

Two scheduling classesTwo scheduling classes““Real time” (fixed) - Real time” (fixed) - priority 16-31priority 16-31

Dynamic - priority 1-15Dynamic - priority 1-15

Higher priorities are Higher priorities are favoredfavored

Priorities of dynamic Priorities of dynamic threads get boosted on threads get boosted on wakeupswakeups

Thread priorities are Thread priorities are never lowerednever lowered

31

15

16

0

Fixed

DynamicI/O

Windows

Page 20: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

20

Scheduling PrioritiesScheduling PrioritiesWindowsWindows

Two scheduling classesTwo scheduling classes““Real time” (fixed) - Real time” (fixed) - priority 16-31priority 16-31

Dynamic - priority 1-15Dynamic - priority 1-15

Higher priorities are Higher priorities are favoredfavored

Priorities of dynamic Priorities of dynamic threads get boosted on threads get boosted on wakeupswakeups

Thread priorities are Thread priorities are never lowerednever lowered

LinuxLinux

Has 3 scheduling classes:Has 3 scheduling classes:

Normal – priority 100-139Normal – priority 100-139

Fixed Round Robin – priority Fixed Round Robin – priority 0-990-99

Fixed FIFO – priority 0-99Fixed FIFO – priority 0-99

Lower priorities are favored Lower priorities are favored

Priorities of normal threads Priorities of normal threads go up (decay) as they use go up (decay) as they use CPUCPU

Priorities of interactive Priorities of interactive threads go down (boost)threads go down (boost)

Page 21: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

21

Scheduling Priorities (cont)Scheduling Priorities (cont)

31

15

16

0

Fixed

DynamicI/O

Windows

140

100

99

0

Fixed FIFO

Fixed Round-Robin

NormalCPU

I/O

Linux

Page 22: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

22

Linux Scheduling DetailsLinux Scheduling Details

Most threads use a dynamic priority policy Most threads use a dynamic priority policy

Normal class - similar to the classic UNIX schedulerNormal class - similar to the classic UNIX scheduler

A newly created thread starts with a base priority A newly created thread starts with a base priority

Threads that block frequently (I/O bound) will have their Threads that block frequently (I/O bound) will have their priority gradually increasedpriority gradually increased

Threads that always exhaust their time slice (CPU bound) will Threads that always exhaust their time slice (CPU bound) will have their priority gradually decreasedhave their priority gradually decreased

““Nice value” sets a thread’s base priorityNice value” sets a thread’s base priority

Larger values = less priority, lower values = higher priorityLarger values = less priority, lower values = higher priority

Valid nice values are in the range of -20 to +20 Valid nice values are in the range of -20 to +20

Nonprivileged users can only specify positive nice valueNonprivileged users can only specify positive nice value

Dynamic priority policy threads have static priority zero Dynamic priority policy threads have static priority zero

Execute only when there are no runnable real-time threadsExecute only when there are no runnable real-time threads

Page 23: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

23

Real-Time Scheduling on LinuxReal-Time Scheduling on Linux

Linux supports two static priority scheduling policies:Linux supports two static priority scheduling policies:

Round-robin and FIFO (first in, first out)Round-robin and FIFO (first in, first out)

Selected with the sched-setscheduler( ) system callSelected with the sched-setscheduler( ) system call

Use static priority values in the range of 1 to 99Use static priority values in the range of 1 to 99

Executed strictly in order of decreasing static priorityExecuted strictly in order of decreasing static priority

FIFO policy lets a thread run to completion FIFO policy lets a thread run to completion

Thread needs to indicate completion by calling the sched-yield( )Thread needs to indicate completion by calling the sched-yield( )

Round-robin lets threads run for up to one time slice Round-robin lets threads run for up to one time slice

Then switches to the next thread with the same static priorityThen switches to the next thread with the same static priority

RT threads can easily starve lower-prio threads from executing RT threads can easily starve lower-prio threads from executing

Root privileges or the CAP-SYS-NICE capability are required for the Root privileges or the CAP-SYS-NICE capability are required for the selection of a real-time scheduling policyselection of a real-time scheduling policy

Long running system calls can cause priority-inversionLong running system calls can cause priority-inversionSame as in Windows; but cmp. rtLinuxSame as in Windows; but cmp. rtLinux

Page 24: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

24

Windows Scheduling DetailsWindows Scheduling Details

Most threads run in variable priority levelsMost threads run in variable priority levels

Priorities 1-15; Priorities 1-15;

A newly created thread starts with a base priority A newly created thread starts with a base priority

Threads that complete I/O operations experience priority Threads that complete I/O operations experience priority boosts (but never higher than 15)boosts (but never higher than 15)

A thread’s priority will never be below base priorityA thread’s priority will never be below base priority

The Windows API function SetThreadPriority() sets the The Windows API function SetThreadPriority() sets the priority value for a specified threadpriority value for a specified thread

This value, together with the priority class of the thread's This value, together with the priority class of the thread's process, determines the thread's base priority levelprocess, determines the thread's base priority level

Windows will dynamically adjust priorities for non-realtime Windows will dynamically adjust priorities for non-realtime threadsthreads

Page 25: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

25

Real-Time Scheduling on WindowsReal-Time Scheduling on Windows

Windows supports static round-robin scheduling policy Windows supports static round-robin scheduling policy for threads with priorities in real-time range (16-31)for threads with priorities in real-time range (16-31)

Threads run for up to one quantumThreads run for up to one quantum

Quantum is reset to full turn on preemptionQuantum is reset to full turn on preemption

Priorities never get boostedPriorities never get boosted

RT threads can starve important system servicesRT threads can starve important system services

Such as CSRSS.EXESuch as CSRSS.EXE

SeIncreaseBasePriorityPrivilege required to elevate a thread’s SeIncreaseBasePriorityPrivilege required to elevate a thread’s priority into real-time range (this privilege is assigned to priority into real-time range (this privilege is assigned to members of Administrators group)members of Administrators group)

System calls and DPC/APC handling can cause priority System calls and DPC/APC handling can cause priority inversioninversion

Page 26: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

26

Scheduling TimeslicesScheduling TimeslicesWindowsWindows

The thread timeslice The thread timeslice (quantum) is 10ms-120ms(quantum) is 10ms-120ms

When quanta can vary, When quanta can vary, has one of 2 valueshas one of 2 values

Reentrant and Reentrant and preemptible preemptible

Fixed: 120ms

20ms

Foreground: 60ms

Background

LinuxLinux

The thread quantum is The thread quantum is 10ms-200ms10ms-200ms

Default is 100msDefault is 100ms

Varies across entire Varies across entire range based on priority, range based on priority, which is based on which is based on interactivity levelinteractivity level

Reentrant and Reentrant and preemptible preemptible

100ms

200ms10ms

Page 27: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

27

Multiprocessor SupportMultiprocessor SupportWindowsWindows

Supports symmetric multiprocessing Supports symmetric multiprocessing (SMP)(SMP)

Up to 32 processors on 32-bit Up to 32 processors on 32-bit WindowsWindows

Up to 64 processors on 64-bit Up to 64 processors on 64-bit WindowsWindows

All CPUs can take interruptsAll CPUs can take interrupts

Supports Non-Uniform Memory Access Supports Non-Uniform Memory Access systemssystems

Scheduler favors the node a thread Scheduler favors the node a thread prefers to run onprefers to run on

Memory manager tries to allocate Memory manager tries to allocate memory on the node a thread memory on the node a thread prefers to run onprefers to run on

Supports HyperthreadingSupports HyperthreadingScheduler favors idle physical Scheduler favors idle physical processors when it has a choiceprocessors when it has a choice

Doesn’t count logical CPUs against Doesn’t count logical CPUs against licensing limitslicensing limits

PhysicalCPU 0

PhysicalCPU 1

0 1 3 4

Ready Thread

Page 28: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

28

Multiprocessor SupportMultiprocessor SupportWindowsWindows

Supports symmetric multiprocessing Supports symmetric multiprocessing (SMP)(SMP)

Up to 32 processors on 32-bit Up to 32 processors on 32-bit WindowsWindows

Up to 64 processors on 64-bit Up to 64 processors on 64-bit WindowsWindows

All CPUs can take interruptsAll CPUs can take interrupts

Supports Non-Uniform Memory Access Supports Non-Uniform Memory Access systemssystems

Scheduler favors the node a thread Scheduler favors the node a thread prefers to run onprefers to run on

Memory manager tries to allocate Memory manager tries to allocate memory on the node a thread memory on the node a thread prefers to run onprefers to run on

Supports HyperthreadingSupports HyperthreadingScheduler favors idle physical Scheduler favors idle physical processors when it has a choiceprocessors when it has a choice

Doesn’t count logical CPUs against Doesn’t count logical CPUs against licensing limitslicensing limits

LinuxLinux

Supports SMPSupports SMP

No upper CPU limit: set as No upper CPU limit: set as kernel build constantkernel build constant

All CPUs can take interruptsAll CPUs can take interrupts

Supports Non-Uniform Memory Supports Non-Uniform Memory Access systemsAccess systems

Scheduler favors the node a Scheduler favors the node a thread last ran onthread last ran on

Memory manager tries to Memory manager tries to allocate memory on the node a allocate memory on the node a thread is running onthread is running on

Supports HyperthreadingSupports Hyperthreading

Scheduler favors idle Scheduler favors idle physical processors when it physical processors when it has a choicehas a choice

Page 29: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

29

Virtual Memory ManagementVirtual Memory ManagementWindowsWindows

32-bit versions split 32-bit versions split user-mode/kernel-mode from user-mode/kernel-mode from 2GB/2GB to 3GB/1GB2GB/2GB to 3GB/1GB

Demand-paged virtual memoryDemand-paged virtual memory32 or 64-bits32 or 64-bits

Copy-on-writeCopy-on-write

Shared memoryShared memory

Memory mapped filesMemory mapped files

User

System

0

2GB

4GB

LinuxLinux

Splits user-mode/kernel-mode Splits user-mode/kernel-mode from 1GB/3GB to 3GB/1GBfrom 1GB/3GB to 3GB/1GB

2.6 has “4/4 split” option where 2.6 has “4/4 split” option where kernel has its own address kernel has its own address spacespace

Demand-paged virtual memoryDemand-paged virtual memory32-bits and/or 64-bits32-bits and/or 64-bits

Copy-on-writeCopy-on-write

Shared memoryShared memory

Memory mapped filesMemory mapped files

User

System

0

3GB

4GB

Page 30: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

30

Physical Memory ManagementPhysical Memory ManagementWindowsWindows

Per-process working setsPer-process working sets

Working set tuner adjust Working set tuner adjust sets according to memory sets according to memory needs using the “clock” needs using the “clock” algorithmalgorithm

No “swapper”No “swapper”

Process

LRU

Reused Page

LinuxLinux

Global working set Global working set managementmanagementuses “clock” algorithmuses “clock” algorithm

No “swapper” (the working No “swapper” (the working set trimmer code is called set trimmer code is called the swap daemon, however)the swap daemon, however)

LRU

Reused Page

Other ProcessLRU

Page 31: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

31

I/O ManagementI/O ManagementWindowsWindows

Centered around the file objectCentered around the file object

Layered driver architecture Layered driver architecture throughout driver typesthroughout driver types

Most I/O supports asynchronous Most I/O supports asynchronous operationoperation

Internal interrupt request level Internal interrupt request level (IRQL) controls interruptability(IRQL) controls interruptability

Interrupts are split between an Interrupts are split between an Interrupt Service Routine (ISR) Interrupt Service Routine (ISR) and a Deferred Procedure Call and a Deferred Procedure Call (DPC)(DPC)

Supports plug-and-playSupports plug-and-play

LinuxLinux

Centered around the vnodeCentered around the vnode

No layered I/O modelNo layered I/O model

Most I/O is synchronousMost I/O is synchronous

Only sockets and direct disk Only sockets and direct disk I/O support asynchronous I/O support asynchronous I/OI/O

Internal interrupt request level Internal interrupt request level (IRQL) controls interruptability(IRQL) controls interruptability

Interrupts are split between an Interrupts are split between an ISR and soft IRQ or taskletISR and soft IRQ or tasklet

Supports plug-and-playSupports plug-and-play

IRQL

Masked

Page 32: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

32

File CachingFile CachingWindowsWindows

Single global common cacheSingle global common cache

Virtual file cacheVirtual file cache

Caching is at file vs. disk block Caching is at file vs. disk block levellevel

Files are memory mapped into Files are memory mapped into kernel memory kernel memory

Cache allows for zero-copy file Cache allows for zero-copy file servingserving

File Cache

File System Driver

Disk Driver

LinuxLinux

Single global common cacheSingle global common cache

Virtual file cacheVirtual file cache

Caching is at file vs. disk block Caching is at file vs. disk block levellevel

Files are memory mapped into Files are memory mapped into kernel memory kernel memory

Cache allows for zero-copy file Cache allows for zero-copy file servingserving

File Cache

File System Driver

Disk Driver

Page 33: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

33

SecuritySecurityWindowsWindows

Very flexible security model based on Very flexible security model based on Access Control ListsAccess Control Lists

Users are defined withUsers are defined withPrivilegesPrivileges

Member groupsMember groups

Security can be applied to any Object Security can be applied to any Object Manager objectManager object

Files, processes, synchronization Files, processes, synchronization objects, …objects, …

Supports auditingSupports auditing

LinuxLinux

Two models: Two models:

Standard UNIX modelStandard UNIX model

Access Control Lists (SELinux)Access Control Lists (SELinux)

Users are defined with:Users are defined with:

Capabilities (privileges)Capabilities (privileges)

Member groupsMember groups

Security is implemented on an Security is implemented on an object-by-object basisobject-by-object basis

Has no built-in auditing supportHas no built-in auditing support

Version 2.6 includes Linux Security Version 2.6 includes Linux Security Module framework for add-on Module framework for add-on security modelssecurity models

Page 34: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

34

Monitoring - Linux procfsMonitoring - Linux procfs

Linux supports a number of special filesystemsLinux supports a number of special filesystems

Like special files, they are of a more dynamic nature and tend to have side Like special files, they are of a more dynamic nature and tend to have side effects when accessedeffects when accessed

Prime example is procfs Prime example is procfs (mounted at /proc)(mounted at /proc)

provides access to and control over various aspects of Linux (I.e.; scheduling provides access to and control over various aspects of Linux (I.e.; scheduling and memory management)and memory management)

/proc/meminfo contains detailed statistics on the current memory usage of Linux/proc/meminfo contains detailed statistics on the current memory usage of Linux

Content changes as memory usage changes over timeContent changes as memory usage changes over time

Services for Unix implements procfs on WindowsServices for Unix implements procfs on Windows

Page 35: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

35

Windows’ Evolution Towards LinuxWindows’ Evolution Towards Linux

Services for Unix 3.5 - really targeted at POSIX, not LinuxServices for Unix 3.5 - really targeted at POSIX, not Linux

POSIX threads, full POSIX subsystem (Interix)POSIX threads, full POSIX subsystem (Interix)

X Window clients+server (X-Win32 LX)X Window clients+server (X-Win32 LX)

nfs, NIS, pamnfs, NIS, pam

proc-file system for Windowsproc-file system for Windows

Configurability / Module ManagementConfigurability / Module Management

Windows XP EmbeddedWindows XP Embedded

Target Designer/Component Designer/Target Designer/Component Designer/Component Management DatabaseComponent Management Database

Editions targeting new Application DomainsEditions targeting new Application Domains

Windows Compute Cluster Server 2003Windows Compute Cluster Server 2003

POSIX compatibility in Windows actually

predates Linux and was one of the original

design goals

Page 36: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

36

Linux’s Evolution Towards WindowsLinux’s Evolution Towards Windows

I/O processingI/O processing

Kernel reentrancyKernel reentrancy

Kernel preemptibilityKernel preemptibility

Per-processor memory allocationPer-processor memory allocation

O(1) scheduler and per-CPU ready queuesO(1) scheduler and per-CPU ready queues

Zero-Copy SendFileZero-Copy SendFile

Wake-One socket semanticsWake-One socket semantics

Asynchronous I/OAsynchronous I/O

Light-weight synchronizationLight-weight synchronization

Page 37: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

37

I/O ProcessingI/O Processing

Linux 2.2 had the notion of bottom halves (BH) for low-Linux 2.2 had the notion of bottom halves (BH) for low-priority interrupt processingpriority interrupt processing

Fixed number of BHsFixed number of BHs

Only one BH of a given type could be active on a SMPOnly one BH of a given type could be active on a SMP

Linux 2.4 introduced Linux 2.4 introduced taskletstasklets, which are non-preemptible , which are non-preemptible procedures called with interrupts enabledprocedures called with interrupts enabled

Tasklets are the equivalent of Windows Deferred Tasklets are the equivalent of Windows Deferred Procedure Calls (DPCs)Procedure Calls (DPCs)

Page 38: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

38

Kernel ReentrancyKernel Reentrancy

Mark Russinovich’s April 1999 Windows NT Magazine article, “Linux Mark Russinovich’s April 1999 Windows NT Magazine article, “Linux and the Enterprise”, pointed out that much of the Linux 2.2 was not and the Enterprise”, pointed out that much of the Linux 2.2 was not reentrantreentrant

Ingo Molnar stated in rebuttal:Ingo Molnar stated in rebuttal:

““his example is a clear red herring.”his example is a clear red herring.”

A month later he made all major paths reentrantA month later he made all major paths reentrant

cpu 1

cpu 2

cpu 1

cpu 2

Non-reentrant

Reentrant

Time Saved

Page 39: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

39

Kernel PreemptibilityKernel Preemptibility

A preemptible kernel is more responsive to high-priority A preemptible kernel is more responsive to high-priority taskstasks

Through the base release of v2.4 Linux was only Through the base release of v2.4 Linux was only cooperativelycooperatively preemptible preemptible

There are well-defined safe places where a thread running in the There are well-defined safe places where a thread running in the kernel can be preemptedkernel can be preempted

The kernel is preemptible in v2.4 patches and v2.6The kernel is preemptible in v2.4 patches and v2.6

Windows NT has always been preemptibleWindows NT has always been preemptible

Page 40: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

40

Per-CPU Memory AllocationPer-CPU Memory Allocation

Keeping accesses to memory localized to a CPU Keeping accesses to memory localized to a CPU minimizes CPU cache thrashingminimizes CPU cache thrashing

Hurts performance on enterprise SMP workloadsHurts performance on enterprise SMP workloads

Linux 2.4 introduced per-CPU kernel memory buffersLinux 2.4 introduced per-CPU kernel memory buffers

Windows introduced per-CPU buffers in an NT 4 Service Windows introduced per-CPU buffers in an NT 4 Service Pack in 1997Pack in 1997

0 1

Buffer Cache 0 Buffer Cache 1

CPUs

Page 41: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

41

SchedulingScheduling

The Linux 2.4 scheduler is O(n)The Linux 2.4 scheduler is O(n)If there are 10 active tasks, it scans 10 of them in a list in order to If there are 10 active tasks, it scans 10 of them in a list in order to decide which should execute nextdecide which should execute next

This means long scans and long durations under the scheduler lockThis means long scans and long durations under the scheduler lock

103 112 112 101ReadyList

Highest PriorityTask

Page 42: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

42

SchedulingScheduling

Linux 2.6 has a revamped scheduler that’s O(1) from Ingo Molnar Linux 2.6 has a revamped scheduler that’s O(1) from Ingo Molnar that:that:

Calculates a task’s priority at the time it makes scheduling decisionCalculates a task’s priority at the time it makes scheduling decision

Has per-CPU ready queues where the tasks are pre-sorted by priorityHas per-CPU ready queues where the tasks are pre-sorted by priority

112 112

101

103

Highest-priorityNon-empty Queue

Page 43: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

43

SchedulingScheduling

Windows NT has always had an O(1) scheduler based Windows NT has always had an O(1) scheduler based on pre-sorted thread priority queueson pre-sorted thread priority queues

Server 2003 introduced per-CPU ready queuesServer 2003 introduced per-CPU ready queues

Linux load balances queues Linux load balances queues

Windows does notWindows does not

Not seen as an issue in performance testing by MicrosoftNot seen as an issue in performance testing by Microsoft

Applications where it might be an issue are expected to use affinityApplications where it might be an issue are expected to use affinity

Page 44: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

44

Zero-Copy SendfileZero-Copy Sendfile

Linux 2.2 introduced Sendfile to efficiently send file data over a Linux 2.2 introduced Sendfile to efficiently send file data over a socketsocket

I pointed out that the initial implementation incurred a copy operation, I pointed out that the initial implementation incurred a copy operation, even if the file data was cachedeven if the file data was cached

Linux 2.4 introduced zero-copy SendfileLinux 2.4 introduced zero-copy Sendfile

Windows NT pioneered zero-copy file sending with TransmitFile, the Windows NT pioneered zero-copy file sending with TransmitFile, the Sendfile equivalent, in Windows NT 4Sendfile equivalent, in Windows NT 4

File DataBuffer

Network AdapterBuffer

Network

File DataBuffer

NetworkDriver

NetworkNetworkDriver

1-Copy 0-Copy

Page 45: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

45

Wake-one Socket SemanticsWake-one Socket Semantics

Linux 2.2 kernel had the Linux 2.2 kernel had the thundering herdthundering herd or or overschedulingoverscheduling problem problem

In a network server application there are typically several In a network server application there are typically several threads waiting for a new connectionthreads waiting for a new connection

In v2.2 when a new connection came in all the waiters would In v2.2 when a new connection came in all the waiters would race to get itrace to get it

Ingo Molnar’s response: Ingo Molnar’s response: 5/2/99: “here he again forgets to _prove_ that overscheduling 5/2/99: “here he again forgets to _prove_ that overscheduling happens in Linux.”happens in Linux.”

5/7/99: “as of 2.3.1 my wake-one implementation and 5/7/99: “as of 2.3.1 my wake-one implementation and waitqueues rewrite went in”waitqueues rewrite went in”

In Linux 2.4 only one thread wakes up to claim the new In Linux 2.4 only one thread wakes up to claim the new connection connection

Windows NT has always had wake-1 semanticsWindows NT has always had wake-1 semantics

Page 46: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

46

Asynchronous I/OAsynchronous I/O

Linux 2.2 only supported asynchronous I/O on socket Linux 2.2 only supported asynchronous I/O on socket connect operations and tty’sconnect operations and tty’s

Linux 2.6 adds asynchronous I/O for direct-disk accessLinux 2.6 adds asynchronous I/O for direct-disk access

AIO model includes efficient management of asynchronous I/OAIO model includes efficient management of asynchronous I/O

Also added alternate epoll modelAlso added alternate epoll model

Useful for database servers managing their database on a Useful for database servers managing their database on a dedicated raw partitiondedicated raw partition

Database servers that manage a file-based database suffer from Database servers that manage a file-based database suffer from synchronous I/Osynchronous I/O

Windows I/O is inherently asynchronousWindows I/O is inherently asynchronous

Windows has had completion ports since NT 3.5Windows has had completion ports since NT 3.5

More advanced form of AIO More advanced form of AIO

Page 47: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

47

Light-Weight SynchronizationLight-Weight Synchronization

Linux 2.6 introduces FutexesLinux 2.6 introduces Futexes

There’s only a transition to kernel-mode when there’s There’s only a transition to kernel-mode when there’s contentioncontention

Windows has always had CriticalSectionsWindows has always had CriticalSections

Same behaviorSame behavior

Futexes go further:Futexes go further:

Allow for prioritization of waitsAllow for prioritization of waits

Works interprocess as well Works interprocess as well

Page 48: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

48

A Look at the FutureA Look at the Future

The kernel architectures are fundamentally similarThe kernel architectures are fundamentally similarThere are differences in the detailsThere are differences in the details

Linux implementation is adopting more of the good ideas used in Linux implementation is adopting more of the good ideas used in WindowsWindows

For the next 2-4 years Windows has and will maintain an edgeFor the next 2-4 years Windows has and will maintain an edgeLinux is still behind on the cutting edge of performance tricksLinux is still behind on the cutting edge of performance tricks

Large performance team and lab at Microsoft has direct ties into the Large performance team and lab at Microsoft has direct ties into the kernel developerskernel developers

As time goes on the technological gap will narrowAs time goes on the technological gap will narrowOpen Source Development Labs (OSDL) will feed performance test Open Source Development Labs (OSDL) will feed performance test results to the kernel teamresults to the kernel team

IBM and other vendors have Linux technology centersIBM and other vendors have Linux technology centers

Squeezing performance out of the OS gets much harder as the OS Squeezing performance out of the OS gets much harder as the OS gets more tunedgets more tuned

Page 49: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

49

Linux Technology UnknownsLinux Technology Unknowns

Linux kernel forkingLinux kernel forking

RedHat has already done it: Red Hat Enterprise Server v3.0 is RedHat has already done it: Red Hat Enterprise Server v3.0 is Linux 2.4 with some Linux 2.6 featuresLinux 2.4 with some Linux 2.6 features

Backward compatibility philosophyBackward compatibility philosophy

Linus Torvalds makes decisions on kernel APIs and Linus Torvalds makes decisions on kernel APIs and architecture based on technical reasons, not business reasonsarchitecture based on technical reasons, not business reasons

Page 50: Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Unit OS B: Comparing the Linux and Windows Kernels.

50

Further ReadingFurther Reading

Transaction Processing Council: www.tpc.orgTransaction Processing Council: www.tpc.org

SPEC: www.spec.orgSPEC: www.spec.org

NT vs Linux benchmarks: www.kegel.com/nt-linux-benchmarks.htmlNT vs Linux benchmarks: www.kegel.com/nt-linux-benchmarks.html

The C10K problem: http://www.kegel.com/c10k.htmlThe C10K problem: http://www.kegel.com/c10k.html

Linus Torvald’s home: http://www.osdl.org/Linus Torvald’s home: http://www.osdl.org/

Linux Kernel Archives: http://www.kernel.org/Linux Kernel Archives: http://www.kernel.org/

Linux history: http://www.firstmonday.dk/issues/issue5_11/moon/Linux history: http://www.firstmonday.dk/issues/issue5_11/moon/

Veritest Netbench result: Veritest Netbench result: http://www.veritest.com/clients/reports/microsoft/ms_netbench.pdfhttp://www.veritest.com/clients/reports/microsoft/ms_netbench.pdf

Mark Russinovich’s 1999 article, “Linux and the Enterprise”: Mark Russinovich’s 1999 article, “Linux and the Enterprise”: http://www.winntmag.com/Articles/Index.cfm?ArticleID=5048http://www.winntmag.com/Articles/Index.cfm?ArticleID=5048

The Open Group's Single UNIX Specification:The Open Group's Single UNIX Specification:http://www.unix.org/version3/http://www.unix.org/version3/