Operating Systems 1 (12/12) - Summary

Operating Systems I - Summary

Beuth HochschuleSummer Term 2014

Intro & History

Operating Systems I | Intro, History PT/FF 2014

The First Computer(s)

• 1801: Power loom driven by wooden punch cards

• 1822: Steam-driven analytical engine by Charles Babbage

• Mechanical decimal stored-program computer, programmable by punch cards, support for calculation and conditional statements

• Remained unbuilt; Ada Byron invented anyway subroutine and looping as programming concepts

• 1890: U.S. census supported by Hollerith desk - punch card reader, counting units, wall of dial indicators

• Built by Tabulating Machine Company, which eventually became International Business Machines

• Invented the idea of output punch cards independent from Babbage

3

(C) computersciencelab.com


The First Computer(s)• 1944: Harvard Mark I developed in partnership between

Harvard and IBM (ballistic firing tables)

• First programmable digital computer made in the U.S.

• Constructed of switches, relays, rotating shafts, clutches

• Grace Hopper found the first computer bug, invented the precessor of COBOL and the first compiler

• 1941: Konrad Zuse completed the work on the Z3

• First programmable electromechanical computer

• Punch film for program and data (lack of paper)

• Mapping of Boolean algebra to relays, developed independently from original Shannon work

4

(C) computersciencelab.com


Von Neumann Architecture

5

(C) Wikipedia

• 1946: ENIAC as first fully electronic computer in the U.S.

• No program memory, re-wiring for each program

• EDVAC: Revolutionary extension with a stored program computer concept by John von Neumann

• Memory contains both the program and the data

• Introduction of a general purpose computer concept


Batch Processing• IBM 1401 - October 5th, 1959

• IBM‘s first affordable general purpose computer, i.e. for accounting

• 1401 Processing Unit, 1402 Card Read-Punch (250 cards/minute), printer

• First concepts of batch processing for multiple job input cards

• Operator loads monitorto run batched jobs from a prepared input tape

• Programs are constructed to branch back to the monitor after termination(early Fortran language)

• First version of a scheduler, better utilization of the extremely expensive hardware6

History of operating systemsHistory of operating systemsBatch processingBatch processingThe elements of the basic The elements of the basic IBM 1401 system are the IBM 1401 system are the 1401 Processing Unit, 1401 Processing Unit, 1402 Card Read1402 Card Read--Punch, Punch, and 1403 Printer. and 1403 Printer.

27

Punching cardsPunching cards Multiprocessing Multiprocessing programmingprogramming

Job 3Job 2Job 1OS

Memory partitions

History of operating systemsHistory of operating systemsBatch processingBatch processingThe elements of the basic The elements of the basic IBM 1401 system are the IBM 1401 system are the 1401 Processing Unit, 1401 Processing Unit, 1402 Card Read1402 Card Read--Punch, Punch, and 1403 Printer. and 1403 Printer.

27

Punching cardsPunching cards Multiprocessing Multiprocessing programmingprogramming

Job 3Job 2Job 1OS

Memory partitions

1/26/2006 CS 446/646 - Principles of Operating Systems - 1. Introduction 36

1.b Operating System History and FeaturesSimple batch systems

a) programmer brings cards to IBM 1401b) 1401 reads batch of jobs onto tapec) operator carries input tape to IBM 7094d) 7094 does computinge) operator carries output tape to 1401f) 1401 prints output

Tanenbaum, A. S. (2001)Modern Operating Systems (2nd Edition).

An early IBM batch system


Batch Processing

• A job control language (JCL) operates the monitor application

• Instructions about the compiler to use, data to work on etc. (Fortran prefix $)

• Early version of system calls

• Monitor needed to switch between itself and the application

• Resident monitor parts always in memory

• Demands on hardware: memory protection, timer, privileged instructions for I/O

• User mode vs. monitor mode (,system mode‘, ,kernel mode‘, ,supervisor mode‘)

7

(C) Stallings


Multi-Programming

8

1/26/2006 CS 446/646 - Principles of Operating Systems - 1. Introduction 52

1.b Operating System History and FeaturesMultiprogrammed batch systems

job 1 job 2 job 3

(a) serial uniprogramming

humanoverheadjob 1 job 2 job 3

(b) batch uniprogramming

job 3job 1

job

1

I/O job

2

I/O job

2

I/O(b’) batch uniprogramming showing actual CPU usage and I/O wait

time

job 3

job 1

job

1

I/Ojo

b 2

I/O job

2

I/O

. . . . . .

(c) multiprogramming Evolution of CPU utilization

human operator’s setup (mount tape, etc.)

human operator’s takedown (unmount)

spooling. . .

(C) CS446/646


Time-Sharing• Compatible Time-Sharing System (CTSS)

• Operating system developed at MIT, first for the IBM 7094 in 1961 (32.768 36bit words memory)

• Program always loaded to start at the location of the 5000th word

• System clock generated interrupts roughly every 0.2 seconds

• At each clock interrupt, the system regained control and assigned the processor to another user - time slicing

• Parts of the active program that would be overwritten are written to disk

• Other parts remained inactive in the system memory

• Direct successor MULTICS pioneered many modern operating system concepts9


Time Sharing

• Users started to demand interaction with their program, e.g. for retry on errors

• Perform multi-tasking, but act like the machine is solely used

• Advent of time-sharing / preemptive multi-tasking systems

• Goal: Minimize single user response time

• Extension of multi-programming to multiple interactive (non-batch) jobs

• Starting point for Unix operating systems in the 1960‘s

• Preemptive multi-tasking became a single user demand in modern times

• Leave application running while starting another one

• Pure batch processing systems are still significant (TPM, SAP R/3, HPC)

10


History of Modern Operating Systems

11

Operating Systems Operating Systems EvolutionEvolution

55

60

65

70

75

IOCS

DOS/360 OS/360

TSO

IBSYS

CTSS

CP/CM5MULTICS

UNIXCP/M

RSX-11MRT-11

28

75

80

85

90

95

00

03

DOS/VDSE

VS

VS/ESA

MVS/370

MVS/XA

MVS/ES

VM/370

VM/XA

VM/ESA

SYSTEM III

SYSTEM V

SYSTEM V.4

UNIXV.7

AIX/370

AIXSUN OS

POSIX

SOLARIS 2

4.1BSD

4.2BSD

4.3BSD

4.4BSD

MACHOSF/1

AIX/ESA

XENIX MS-DOS 1.0

CP/M

DR/DOSOS/2WIN 3.0

WIN NT

WIN 2000

WIN 9X

WIN XP

LINUX

VMS 1.0

VMS 5.4

VMS 7.3

WIN 3.1

SOLARIS 10LINUX 2.6WIN Server 2003


History of Unix• 1983: AT&T UNIX System V released as one of the first commercial versions

• Interface definition still used for modern Unix systems such as AIX and Solaris

• Richard Stallman starts the GNU project for free Unix-compatible software

• 1988: Different APIs ultimately united by IEEE POSIX specification

• 1989: SVR4 unification of BSD and System V / Windows NT development starts

• 1991: Linus Torvalds begins to work an a Unix clone for IBM PCs - Linux

• 1992: Berkeley alumni‘s publish 386BSD port of BSD to Intel‘s 386,foundation for later FreeBSD and NetBSD unix versions

• Since 1996: Unix trademark owned by Open Group, certification process

• 1997: Apple creates Darwin kernel out of the Mach kernel and Unix BSD parts

• 1999: System V release 4 (SVR4) binary format ELF became agreed Unix standard

12


History of Windows [Lucovsky]

• Initial team formed in November 1988

• 6 former Digital developers, one Microsoft guy

• Focus on secure, scalable SMP design for desktops and servers

• Original schedule for 18 months, missed by 3 years

• Goal setting

• Portability: Initial focus on Intel i860, intentionally late focus on i386

• Reliability: Nothing should be able to crash the OS (promise fulfilled)

• Extensibility and compatibility (DOS, OS/2, POSIX)

• After all of the above - performance

• NT 3.1 had 6-200 developers, NT 4.0 800 developers, Windows 2000 had 1400

13

Operating Systems I PT / FF 14

Computer Systems Today

14

UsersApplication Programs

System Programs Operating System

FirmwareHardware

Cont

rol e

xecu

tion

of p

rogr

ams

Abst

ract

from

ha

rdw

are


Classes of Operating Systems• Desktop / Server Operating Systems

• Distributed Operating Systems

• Implements a single operating system instance spanning multiple machines

• Applications have single memory space view

• No significant real-world adoption, mainly research topic

• Real-Time Operating System (RTOS)

• Deterministic timing behavior of operating system services

• Support for real-time application scheduling and resource management

• Wide adoption for industry applications

• Examples: LynxOS, OSE, QNX, RTLinux, VxWorks

• Embedded Operating System15

Hardware


Hardware Basics

17


Hardware Basics• Symmetric Multi-Processing (SMP)

• Two or more processors in one system, can perform the same operations (symmetric)

• Processors share the same main memory and all devices

• Increased performance and scalability for multi-tasking

• No master, any processor can cause another to reschedule

• Multi-Core / many-core processor combines computational cores on one chip with shared caches

• Challenges for an SMP operating system:

• Reentrant kernel, scheduling policies, synchronization, memory re-use, ...18

(C) Stallings


Hardware Basics

• Hyperthreading

• Make a single processor appear to be two virtual processors by maintaining separate CPU states, while execution engine and caches are still shared

• Also called Simultaneous multithreading (SMT)

• Operating systems must consider them separately in scheduling (in Windows since XP)

19

(C) Intel


Hardware Basics

• Parallelism

• Inside the processor (instruction-level parallelism, multicore)

• Through multiple processors in one machine (multiprocessing)

• Through multiple machines (multicomputer)

20

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

"#$%&'()!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

*)#+(,,#)!"-.%!

!

!

!

!

!

"#$%&'/'.#0/1!

203.0(!

4/,5!

4/,5!

4/,5!

4/,5!

!"#$%

&"'"(()(*#+%

,-#.'/0.*1-%

2.')"+%

6#3.+/1!!

*)#+(,,#)!

6#3.+/1!

*)#+(,,#)!

"-.%!7&1'.84-)(/9.03!:"74;!

<=$$(').+!7&1'.8*)#+(,,.03!:<7*;!

7&1'.+#$%&'()!

!

!

!

!

!

"#$%&'/'.#0/1!

203.0(!

6#3.+/1!

*)#+(,,#)!

6#3.+/1!

*)#+(,,#)!

*)#+(,,#)!"-.%!

>>>!

>>>!

"#$%&'()!

"-.%!7&1'.8*)#+(,,.03!:"7*;!

>>>!

<.$&1'/0(#&,!

7&1'.84-)(/9.03!

:<74;!

?0,')&+'.#086(@(1!

%/)/11(1.,$!:?6*;!

A/'/!

A/'/!

A/'/!

A/'/!

3"."%

&"'"(()(*#+%

%

%

Figure 1: Hardware parallelism hierarchy

3


Hardware Basics

• Major constraints in memory as a resource are amount, speed, and costs

• Faster access time results in greater costs per bit

• Greater capacity results in smaller costs per bit

• Greater capacity results in slower access

• Idea: Going down a memory hierarchy

• Decreasing costs per bit

• Increasing capacity for fixed costs

• Increasing access time

21

http

://tjli

u.m

yweb

.hine

t.net

/


(C) Stallings

Hardware Basics

• Principle of Locality

• Memory referenced by a processor (program and data) tends to cluster

• Iterative loops and subroutines, small set of instructions inside

• Operations on tables and arrays involve access to clustered data sets

• Data should be organized so that the percentage of accesses to lower levels is substantially less than to the level above

• Typically implemented by caching concept

• I/O devices provide non-volatile memoryon lower levels, which is an additional advantage

22


Hardware Basics

• Caching

• Offer a portion of lower level memory as copy in the faster smaller memory

• Leverages the principle of locality

• Processor caches work in hardware, but must be considered by an operating system

23

(C) Stallings


Hardware Basics

• NUMA (non uniform memory architecture) systems

• Groups of physical processors (called “nodes”) that have local memory

• Connected to the larger system through a cache-coherent interconnect bus

• Still an SMP system (e.g. any processor can access all of memory),but node-local memory is faster

• Operating system tries to schedule close activities on the same node

• Became the default model in all recent architectures

24

Processor A Processor B

Cache Cache

Memory

Processor C Processor D

Cache Cache

MemoryHigh-Speed Interconnect


Hardware Basics

• Central Processing Units (CPUs) + volatile memory + I/O devices

• Fetch instruction and execute it - typically memory access, computation, and / or I/O

25

(C) Stallings

• I/O devices and memory controller may interrupt the instruction processing

• Improve processor utilization by asynchronous operations


Hardware Basics

26

• I/O program prepares an I/O operation, waits for finalization and prepares the result for further processing

• Usage of interrupts reduces the application I/O wait time to the pre- and post phases of I/O processing

• Interrupt can occur at any point in the execution of the user program, must be managed by the operating system


Hardware Basics

• All computers have mechanisms to let I/O and memory modules interrupt the current processor work

• Consider the speed aspect of I/O devices in the memory hierarchy

• Different classes of interrupt

• Program interrupt: Condition from program execution leads to exceptional situation, such as arithmetic overflow, division by zero, illegal instruction

• Timer interrupt: Programmed hardware time signals the time event,e.g. for regular operating system activities

• I/O interrupt: Generated by any kind of hardware unit to signal I/O completion or an error condition

• Hardware failure interrupt: Hardware module signals permanent issue

27

OS Architectures


Basic Concepts: Process and Virtual Memory

• Unit of execution in operating system is traditionally a process

• Term introduced with MULTICS in the 60‘s to generalize job concept

• Instance of an executed program binary

• Management of multiple running processes is a core operating system task

• Every process contains of executable program, associated data, and state

• Concurrent utilization of resources demands some management

• Operating system must prevent independent processes from mutual code and data access (isolation)

• Sharing should be explicitely allowable under given security constraints

• Same holds for devices offering resources (storage, communication, ...)

• Operating system should leverage the memory hierarchy transparently ...29



30

0x00000000 0xFFFFFFFF

0x00000000 0xFFFFFFFF

Physical memory address space

Physical memory address space

Process 1 Process 2 Process 3

•Instructions •State •Data

?

Process 1 Process 2 Process 3

Virtual memoryaddress space

0x00000000 0xFFFFFFFF 0x00000000 0xFFFFFFFF 0x00000000 0xFFFFFFFF





• Virtual memory concept in nearly all existing operating systems

• Programs address memory from a logical point of view, independent from the amount of physical memory

• Multiple processes can concurrently exist at the same timeif physical memory is abstracted

• Today‘s system can rely on hardware support for this

• Rules

• No user process can touch another user process address space without passing through the operating system security check

• No user process can touch the memory reserved for the operating system itself

31


Operating System Design Goals

• (A) Hide complexity and heterogeneity of the underlying hardware

• Enable the development of portable applications

• (B) Manage system resources

• Time multiplexing: Each process gets to use a resource

• Space multiplexing: Each process gets some part of a resource

• Manage concurrent utilization

• (C) Ensure flexibility, portability and security through layering

• New, fixed or updated services without application modification

• Encapsulation and protection of concurrent users

• Integration of new or modified hardware components

32


(C) Layered Architecture

33

(C) Stallings


Layered Architecture• Old monitor concept translates to separation based on two operation modes:

• Kernel mode

• Privileged mode with strict assumptions about reliability/security of code

• Code is typically memory resident (depends on OS type)

• Management of CPU(s), main memory and I/O, parts of the file system and networking functionality, all device drivers

• Kernel-mode code shared one view on the available memory

• User mode

• Flexible mode for applications with simpler maintenance and debugging

• Each process in user mode has it‘s own view on the available memory

• User mode processes access kernel mode code via system calls

34


Layered Architecture

• System call

• Mode change of the CPU - „trap“

• Parameter transmission possible via trap parameter, register or stackcount=read(file,buffer, nbytes);

• Monolithically designed system

• Modularized kernel components compiled as one large binary

• Hardware abstraction and resource management provided through system service interface

35

Structuring of Operating SystemsStructuring of Operating Systems

Monolithical systemsMonolithical systemsUnstructuredUnstructuredSupervisor call Supervisor call changeschanges

App App

System services

User ModeKernel Mode

34

fromfrom user mode inuser mode intotokernel modekernel mode

Hardware

OS procedures



• Layered operating system

• Each layer in the kernel (!) is given access only to lower-level interfaces

36

Layered OSLayered OS

Each layer is given access only to lowerEach layer is given access only to lower--level level interfacesinterfaces

ApplicationProgram

ApplicationProgram

ApplicationProgram

User Mode

35

System Services

File System

Memory and I/O Device Management

Processor Scheduling

Hardware

User Mode

Kernel Mode



• Microkernel operating system

• Kernel mode implements only functionality that cannot be put into user mode without breaking functionality [Jochen Liedtke]

• Commonly in kernel mode: Scheduling, memory management, and interprocess communication (IPC)

• Remaining tasks are covered by user-mode servers

37

Microkernel OSMicrokernel OS(Client/server OS)(Client/server OS)

Kernel implements:Kernel implements:SchedulingSchedulingMemory Memory

MemoryServer

ClientApp

Network Server

ProcessServer

FileServer

DisplayServer

38

Memory Memory ManagementManagementInterprocess Interprocess communication communication (IPC)(IPC)

UserUser--mode serversmode servers

Microkernel

Hardware

request

reply

User ModeKernel Mode


Application Programming Interface (API)

38

C Library API Windows API POSIX API

Windows Operating Systems

Unix (and Windows)

Operating Systems

Operating Systems acting

as Hosted C Environment

Application

System Calls System CallsSystem CallsUser

Kernel

UNIX OS API

Unix Operating

System Flavor (BSD / SYSV)

System Calls


API Standardization

• Standard C Library

• 1989 first version from the American National Standards Institute (ANSI) - C89

• Last version from 2011 (C11), glibc as most common implementation

• Focus on C source code portability for all operating systems

• Portable C programs are easy

• Popular examples: Apache, MySQL, PostgreSQL

• Portable non-C programs typically have a C-based runtime system

• Hadoop (Java), Django (Python)

• Non-portable C programs are possible

• Compiler-specific libraries, operating system - specific system calls

39

http://de.wikipedia.org/wiki/American_National_Standards_Institute


Windows Operating System Family

• Extensibility: Code must be able to grow and change as market requirements change.

• Portability: The system must be able to run on multiple hardware architectures and must be able to move with relative ease to new ones as market demands dictate.

• Dependability: Protection against internal malfunction and external tampering.

• Applications should not be able to harm the OS or other running applications.

• Compatibility: User interface and APIs should be compatible with older versions of Windows as well as older operating systems such as MS-DOS.

• It should also interoperate well with UNIX, OS/2, and NetWare.

• Performance: Within the constraints of the other design goals, the system should be as fast and responsive as possible on each hardware platform.

40


Simplified Windows Architecture

41

VMS and WindowsVMS and Windows-- a bird’sa bird’s--eye view on architectureseye view on architectures

Program Development Tools

Layered Products(Apps)

Layered design for VAX/VMSoperating system OS/2 Windows

POSIX

Environment Subsystems

UserApplication

Subsystem DLLUserModeKernelMode

System& ServiceProcesses

Windows

46

System-wide data structures

Memory Management I/O Subsystem Process and

time management

System servicesKernel

Record Management Service (RMS)Executive

Command Language Interpreter (CLI)Supervisor

Platform-Adaptation Layer (PAL) - Alpha

Support LibrariesUtilities

User

Windows high-level architecture

WindowsUser/GDIDeviceDriver

Executive

Device Drivers Kernel

Hardware Abstraction Layer (HAL)

Mode


Windows Executive

• Upper layer of the operating system

• Process and thread manager - additional semantics to lower level objects

• Object manager - manages representation of resources

• Configuration manager - implementation of the system registry

• Memory manager / cache manager - implementation of virtual memory

• Security reference monitor (SRM) - policy enforcement, auditing, object protection

• I/O manager - device-independent I/O dispatching

• Power manager, Plug-and-Play manager, LPC (local procedure call) facility

• Almost completely portable C code, runs in kernel mode

• Most interfaces to executive services not officially documented

42

VMS and WindowsVMS and Windows-- a bird’sa bird’s--eye view on architectureseye view on architectures

Program Development Tools

Layered Products(Apps)

Layered design for VAX/VMSoperating system OS/2 Windows

POSIX

Environment Subsystems

UserApplication

Subsystem DLLUserModeKernelMode

System& ServiceProcesses

Windows

46

System-wide data structures

Memory Management I/O Subsystem Process and

time management

System servicesKernel

Record Management Service (RMS)Executive

Command Language Interpreter (CLI)Supervisor

Platform-Adaptation Layer (PAL) - Alpha

Support LibrariesUtilities

User

Windows high-level architecture

WindowsUser/GDIDeviceDriver

Executive

Device Drivers Kernel

Hardware Abstraction Layer (HAL)

Mode


Windows API

• System resources are kernel objects referenced by a handle

• handle vs. UNIX file descriptors & PIDs

• Kernel objects can be manipulated only via a subsystem API

• Objects have security attributes

• Files, processes, threads, IPC pipes, memory mappings, events

• Windows API is rich & flexible

• Convenience functions often combine common sequences of function calls

• Function names are long and descriptive (as in VMS)

• WaitForSingleObject(), WaitForMultipleObjects()

• Windows API offers numerous synchronization and communication mechanisms

43


Windows Security

• Foundational concepts: Objects and handles

• Objects are placeholders for (protected) system resources that may be shared

• Process, thread, file, event objects from user space are mapped on executive objects

• Object services offer read/write access to attributes

• All security and protection based on NT Executive objects

• Discretionary control: read/write/access rights

• Privileged access: administrator may take ownership of files

• Windows API take handles to system “objects” as parameter

• Handle table in kernel adress space, unique per process,

• Security check at handle creation time only44


Windows Subsystem Call

45

Example: Invoking a Windows Kernel API

20

call WriteFile(…)

call NtWriteFilereturn to caller

do the operationreturn to caller

Int 2E or SYSCALL or SYSENTERreturn to caller

call NtWriteFiledismiss interrupt

Windows application

WriteFile in Kernel32.Dll

NtWriteFilein NtDll.Dll

KiSystemServicein NtosKrnl.Exe

NtWriteFilein NtosKrnl.Exe

Windows-specific

used by all subsystems

software interrupt

U

K


Modern UNIX Systems• System V Release 4 (SVR4) was a major milestone

• AT&T and Sun Microsystems (R.I.P.) combined so-far diverging Unix flavors

• Intention to provide uniform platform for commercial UNIX deployment

• Added preemptive kernel, virtual memory concepts, virtual file system support

• Solaris is the successor of Sun‘s SVR4-based UNIX release

• 4.4BSD was the final version from Berkeley university

• Meanwhile many successful derivatives, including Mac OS X

• Most modern UNIX kernels are monolithic

• All functional components of the kernel have access to all data and methods

• Loadable modules (object files) that can be linked to / unlinked from the kernel at runtime, stackable

46


System Programming in Unix• Unix system interface is a mixture of C library, POSIX, and custom functions

• Linux

• POSIX 1003.1 (mostly) + Standard C library + SVR4 + BSD functions

• Every system call has a platform-dependent symbolic constant(asm-<arch>/unistd.h) and a symbolic name

• Classes: Process management, time-related functions, signal processing, scheduling, kernel modules, file system, memory management, IPC, network, monitoring, security

• MacOS X

• BSD portion derived from FreeBSD (4.4BSD) + Standard C library + ObjC specials

• Free BSD

• POSIX 1003.1 (mostly) + Standard C library + BSD functions

47


Unix: Everything Is A File

• „The UNIX Time-Sharing System“ - D. M. Ritchie and K. Thompson, 1974

48


Unix: Everything Is A File• Hierarchical namespace of special files, ordinary files and directories

• Support for mountable sub trees in one hierarchy

• Today typically de-named as Virtual File System (VFS) concept

• Each supported I/O device is associated with at least one special file in /dev

• Read and written as ordinary files, but leads to device interaction

• Protection relies on filesystem mechanisms

• „Everything can have a file descriptor“ is a better description than „Everything is a file“ [Brown2007]

• /proc

• Special file system mounted by the kernel at boot time (since SVR4 / BSD)

• Representation of kernel information as files, possibility for user - kernel mode interaction (e.g. ps tool)

49


Linux

50


Linux Modules• Support for dynamically loaded and linked binary kernel parts - modules

• Reduces size of the compiled monolithic kernel binary

• Allows driver integration without re-compilation of the kernel

• Also solves some GPL licensing issues with modern hardware drivers

• Modules are relocatable object files that are linked into the kernel

• Kernel has table of registered functions with their address (/proc/kallsyms)

• Dynamic linker (ld.so) can load and re-locate the code accordingly (more later)

• modprobe tool, relies on insmod tool which uses the init_module system call

• Considers module dependencies determined by depmod utility (modules.dep)

• Kernel can trigger kmod daemon to automatically load missing module(request_module)

51

Processes & Dispatching


Basic Concept: Process

• Unit of execution in operating system is traditionally a process

• Term introduced with MULTICS in the 60‘s to generalize job concept

• Instance of an executed program binary

• Management of multiple running processes is a core operating system task

• Concurrent utilization of resources demands some management

• Operating system must prevent independent processes from mutual code and data access (isolation)

• Sharing should be explicitly allowable under given security constraints

• Same holds for devices offering resources (storage, communication, ...)

53


Process Image• Collection of process information is often called image

• Context / execution state

• Status information in the CPU for time sharing

• Management information, such as priority, security tokens and call stacks

• Program code

• Potentially shared between running processes(e.g. libraries)

• Associated system resources

• Must be subject to security checks

• Data

54


Life of a Process• Reasons for process creation

• New application / job start

• Interactive logon by a user

• Operating system service is started (e.g. printer spooler)

• Spawned by an existing process - parent process vs. child process

• Reasons for process termination

• Completion signaled by process itself - HALT instruction, system call, return jump

• Indication from the user in an interactive process

• Request from parent process or administrator

• Failure condition - execution time limits, lack of memory, protection error, arithmetic error, I/O failure, invalid instruction, privileged instruction, parent termination, ...

55


Unix: Process Creation

• Forking: Fundamental Unix concept, difference to VMS / Windows

• Child processes come to live as copy of their parent processes

• A new unique ID is created

• The operating system adds the new process to the management tables

• The process image of the parent is copied

• Shared memory regions are always left out

• Modern fork() variations can copy only parts of it

• Reference counters for parent-owned resources are increased

• Child process state is changed to ready to run

• fork() system call returns the process ID of the newly created child to the parent

56

57 Process Tree in Unix

58

#include <stdio.h>!#include <unistd.h>!#include <stdlib.h> !int main ()!{! int pid, j, i; ! pid = fork();! if (pid == 0)! {! /* child */! for (j=0; j < 10; j++) {! printf ("Child process: %d (PID: %d)\n", j, getpid());! sleep (1);! }! exit (0);! } else if (pid > 0) { ! /* parent */! for (i=0; i < 10; i++) {! printf ("Parent process: %d (PID: %d)\n", i, getpid());! sleep (1);! }! } else { ! /* Negative result means we have a problem */! fprintf (stderr, "Error");! exit (1);! }! return 0;!}


Process Control Block

• A process is internally characterized by a process control block

• Unique identifier

• Execution state (e.g. running, suspended, terminated)

• Priority level in comparison to other processes

• CPU context state related to this process, such as the program counter

• Memory regions for code and data used by the process

• I/O status information, such as outstanding requests, devices and open files

• Accounting information: time limits, clock time used, ...

• Security information, e.g. process owner59


Dispatching

• Process control block

• Information allows the interruption and continuation of any given process at any given time

• Interruption is not directly visible to the application

• Multi-tasking is managed by the operating system

• Dispatcher

• Code that switches a processor from one process to another

• Relies on queue of ready-to-be-executed processes

• Most simple process state model:running vs. not running

60


Dispatching

• Process dispatching relies on some kind of execution interruption

• Interrupt: An external event triggers the execution of a handler function

• Clock interrupt, I/O interrupt, memory controller interrupt

• Trap: An error or exception condition occurred within the running process

• Typically noticed by hardware itself, such as the memory controller

• Operating system trap handler decides on necessary activity

• Processor hardware detects pending interrupt / trap

• Sets the program counter to the starting address of the interrupt handler

• Leads to switch from user to kernel mode, in order to allow privileged instructions

• Handler for the clock interrupt typically activates the dispatcher code

61


Dispatching

• Several steps for a process switch

• Save the context of the processor

• Update the control block of the running process to reflect the state change

• Add a reference to this control block in the appropriate queue

• Select another process for execution

• Update the control block of the selected process

• Update memory management data structures

• Restored the saved processor context of the selected process,including program counter

62


Five-State Process Model

63

• Waiting for something renders a process unusable for dispatching

• Examples: Blocking system call, synchronization primitives

• Single queue approach is inappropriatein practice

• Extension of state model

• Dispatch queue contains only processes in „ready“ state

• Explicit consideration of preparation phase in the operating system kernel(code loading, memory allocation, ...)

• Explicit consideration of blocked (not-runnable) processes


Running the Operating System ?

• If dispatching is given, how to combine operating system and user processes ?

• (a) Kernel is a separate entity, running in privileged mode

• On interrupt or system call, kernel code is activated

• Dedicated memory region, dedicated call stacks

• (b) Kernel executes within user processes

• Operating system as set of utility functions

• Small amount of code for memory management and process switching

• (c) Kernel is executed as collection of system processes

• Utilization of operating system functions by inter-process communication

64


Services and Daemons

• Operating system code running in user processes is default

• Still demands permanently running (operating system) activities

• Windows: Core system processes (e.g. csrss.exe, lsass.exe), service processes

• Unix: Daemons

• Services and daemons haveno direct user interface

• Executed with specializedsecurity credentials,independent from currentuser

• Managed by operatingsystem mechanisms

65

Life%of%a%Service

� Install%time� Setup%application%tells%Service%Controller%

about%the%service

� System%boot/initialization� SCM%reads%registry,%starts

services%as%directed

� Management/maintenance� Control%panel%can%start%and%stop%services%

and%change%startup%parameters

52

ServiceController/Manager

(Services.Exe)

SetupApplication

CreateServiceRegistry

ServiceProcesses

ControlPanel (C) Russinovich et al.

Threads & Concurrency


Single and Multithreaded Processes

67

code% data% files%

registers% stack%

Thread'

code% data% files%

registers%

stack%

Thread'

stack%

registers%

stack%

registers%

Thread' Thread'


Multithreading

• Each thread has

• An execution state (Running, Ready, etc.)

• Saved thread context when not running

• An execution stack

• Some per-thread static storage for local variables

• Access to the memory and resources of its process (all threads of a process share this)

• Suspending a process involves suspending all threads of the process

• Termination of a process terminates all threads within the process

68


Multithreading

69

• Advantages

• Better responsiveness - dedicated threads for handling user events

• Simpler resource sharing - all threads in a process share same address space

• Utilization of multiple coresfor parallel execution

• Faster creation and termination of activities

• Disadvantages

• Coordinated termination

• Signal and error handling

• Reentrant vs. non-reentrant system calls


Control Blocks

7020

Program'Counter'

Parent'PID'

…'

Handle'Table'

Process'ID'(PID)'

Registers'

Next'Process'Block'

Image'File'Name'

PCB'

List'of'Thread'Control'Blocks'

List'of'open'files'

…'

Next'TCB'

…'

Thread'Control'Block'(TCB)'


Thread States

• The typical states for a thread are running, ready, blocked

• Typical thread operations associated with a change in thread state are:

• Spawn: a thread within a process may spawn another thread

• Provides instruction pointer and arguments for the new thread

• New thread gets its own register context and stack space

• Block: a thread needs to wait for an event

• Saving its user registers, program counter, and stack pointers

• Unblock: When the event for which a thread is blocked occurs

• Finish: When a thread completes, its register context and stacks are deallocated.

71


Example: Windows

• The Windows kernel dispatches threads for multi-tasking

• Implementation of 1:1 mapping - „processes do not run, threads run“

• In principle, all threads are equal - no consideration of process in scheduling

• Thread context switch always involves the kernel

• Every process starts with one main thread, may create more

• Per-process data:

• Virtual address space, working set („owned“ physical memory), access token, handle table for kernel objects, environment strings, command line

• Per-thread data:

• User-mode stack (call frames, arguments), kernel-mode stack, thread-local storage, scheduling state, thread priority, hardware context, optional access token

72


Windows - Thread States

• Init: The thread is being created.

• Ready: The thread is waiting to be assigned to a CPU.

• Running: The thread’s instructions are being executed.

• Waiting: The thread is waiting for some event to occur.

• Terminated: The thread has finished execution.

73

init$

ready$

wai+ng$

running$

terminated$

scheduler$dispatch$

wai+ng$for$I/O$or$event$

I/O$or$event$comple+on$

interrupt$$quantum$expired$

admi<ed$ exit$


Linux Threads

• No explicit distinguishing between threads and processes

• User-level threads are mapped to kernel tasks

• User threads in the same user process share the same thread group ID

• Enables resource sharing, avoids context switch on dispatching

• clone() as extended version of fork()

• On cloning, decision about sharing of memory, file handles, a.s.o. is made

• Thread libraries use this capability

74

CLONE_CLEARID Clear the task ID.

CLONE_DETACHED The parent does not want a SIGCHLD signal sent on exit.

CLONE_FILES Shares the table that identifies the open files.

CLONE_FS Shares the table that identifies the root directory and the current

working directory, as well as the value of the bit mask used to mask the initial file permissions of a new file.

CLONE_IDLETASK Set PID to zero, which refers to an idle task. The idle task is employed

when all available tasks are blocked waiting for resources.

CLONE_NEWNS Create a new namespace for the child.

CLONE_PARENT Caller and new task share the same parent process.

CLONE_PTRACE If the parent process is being traced, the child process will also be

traced.

CLONE_SETTID Write the TID back to user space.

CLONE_SETTLS Create a new TLS for the child.

CLONE_SIGHAND Shares the table that identifies the signal handlers.

CLONE_SYSVSEM Shares System V SEM_UNDO semantics.

CLONE_THREAD Inserts this process into the same thread group of the parent. If this flag is true, it implicitly enforces CLONE_PARENT.

CLONE_VFORK If set, the parent does not get scheduled for execution until the child

invokes the execve() system call.

CLONE_VM Shares the address space (memory descriptor and all page tables).


POSIX Threads

• POSIX standardized (IEEE 1003.1c) API for thread creation and synchronization

• API specifies behavior of the thread library, not an implementation

• Thread creation and termination, stack management

• Synchronization between threads

• Scheduling hints

• Thread-local storage

• Implemented on many UNIX operating systems to allow portable concurrent C code

• Windows: Services for Unix (SFU) implement pthreads on Windows

• Linux 2.6: Native POSIX Thread Library (NPTL) implementation of pthreads

75

76

#include <pthread.h>!#include <stdio.h>!#include <string.h>!#include <unistd.h>!!void * hello_thread( void *arg ) {!! printf( "hello " ); return( 0 ); }!!void * world_thread( void *arg ) {!int n;!

! pthread_t!tid!= (pthread_t) arg;!! if ( n = pthread_join( tid, NULL ) ) {!! ! fprintf( stderr, "pthread_join: %s\n", strerror( n ) );!! ! return( NULL ); }!! printf( "world\n" );!! pthread_exit( 0 ); }!!int main( int argc, char *argv[] ) {!! int!n;!! pthread_t!htid, wtid;!! if ( n = pthread_create( &htid, NULL, hello_thread, NULL ) ) {!! ! fprintf( stderr, "pthread_create: %s\n", strerror( n ) );!! ! return( 1 ); }!!! if ( n = pthread_create( &wtid, NULL, world_thread, (void *) htid ) ) {!! ! fprintf( stderr, "pthread_create: %s\n", strerror( n ) );!! ! return( 1 ); }!!! if ( n = pthread_join( wtid, NULL ) ) {!! ! fprintf( stderr, "pthread_join: %s\n", strerror( n ) );!! ! return( 1 ); }!! return( 0 ); }

77

Wha

t hap

pens

?

ParProg | Introduction PT / FF 14

Abstraction of Concurrency [Breshears]

• Processes / threads represent the execution of atomic statements

• „Atomic“ can be defined on different granularity levels, e.g. source code line,so concurrency should be treated as abstract concept

• Concurrent execution is the interleaving of atomic statements from multiple sequential processes

• Unpredictable execution sequence of atomic instructions due to non-deterministic scheduling and dispatching, interrupts, and other activities

• Concurrent algorithm should maintain properties for all possible inter-leavings

• Example: All atomic statements are eventually included (fairness)

• Some literature distinguishes between interleaving (uniprocessor) and overlapping (multiprocessor) of statements - same problem

78


Concurrency

• Management of concurrent activities in an operating system

• Multiple applications in progress at the same time, non-sequential operating system activities

• Time sharing for interleaved execution

• Demands dispatching and synchronization

• Parallelism: Actions are executed simultaneously

• Demands parallel hardware

• Relies on a concurrent application

79

Core Core

time

Thre

ad 1

Thre

ad 2

Thre

ad 1

Thre

ad 2

Memory Memory

Core


Concurrency is Hard

• Sharing of global resources

• Concurrent reads and writes on the same variable makes order critical

• Optimal management of resource allocation

• Process gets control over a I/O channel and is then suspended before using it

• Programming errors become non-deterministic

• Order of interleaving may / may not activate the bug

• Happens all with concurrent execution, which means even on uniprocessors

• Race condition

• The final result of an operation depends on the order of execution

• Well-known issue since the 60‘s, identified by E. Dijkstra

80


Terminology

• Deadlock („Verklemmung“)

• Two or more processes / threads are unable to proceed

• Each is waiting for one of the others to do something

• Livelock

• Two or more processes / threads continuously change their states in response to changes in the other processes / threads

• No global progress for the application

• Race condition

• Two or more processes / threads are executed concurrently

• Final result of the application depends on the relative timing of their execution

81


Terminology

• Starvation („Verhungern“)

• A runnable process / thread is overlooked indefinitely

• Although it is able to proceed, it is never chosen to run (dispatching / scheduling)

• Atomic Operation („Atomare Operation“)

• Function or action implemented as a sequence of one or more instructions

• Appears to be indivisible - no other process / thread can see an intermediate state or interrupt the operation

• Executed as a group, or not executed at all

• Mutual Exclusion („Gegenseitiger Ausschluss“)

• The requirement that when one process / thread is using a resource, no other shall be allowed to do that

82


Example: The Dining Philosophers (E.W.Dijkstra)

• Five philosophers work in a college, each philosopher has a room for thinking

• Common dining room, furnished with a circular table, surrounded by five labeled chairs

• In the center stood a large bowl of spaghetti, which was constantly replenished

• When a philosopher gets hungry:

• Sits on his chair

• Picks up his own fork on the left and plungesit in the spaghetti, then picks up the right fork

• When finished he put down both forks and gets up

• May wait for the availability of the second fork


Critical Section

• Only 2 threads, T0 and T1

• General structure of thread Ti (other thread Tj)

• Threads may share some common variables to synchronize their actions

84

do { enter section critical section exit section reminder section } while (1);


Critical Section Protection with Hardware

• Traditional solution was interrupt disabling, but works only on multiprocessor

• Concurrent threads cannot overlap on one CPU

• Thread will run until performing a system call or interrupt happens

• Software-based algorithms also do not work, due to missing atomic statements

• Modern architectures need hardware support with atomic machine instructions

• Test and Set instruction - read & write memory at once

• If not available, atomic swap instruction is enough

• Busy waiting, starvation or deadlock are still possible

85

#define LOCKED 1! int TestAndSet(int* lockPtr) {! int oldValue;! oldValue = SwapAtomic(lockPtr, LOCKED);! return oldValue;! }

function Lock(int *lock) {! while (TestAndSet (lock) == LOCKED);!}

86

„Manual“ implementation!of a critical section for !

interleaved output


Binary and General Semaphores [Dijkstra]

• Find a solution to allow waiting processes to ,sleep‘

• Special purpose integer called semaphore

• P-operation: Decrease value of its argument semaphore by 1 as atomic step

• Blocks if the semaphore is already zero -wait operation

• V-operation: Increase value of its argument semaphore by 1 as atomic step

• Releases one instance of the resource for other processes - signal operation

• Solution for critical section shared between N processes

• Binary semaphore has initial value of 1, counting semaphore of N87

wait (S): while (S <= 0); S--; // atomic

signal (S): S++; // atomic

do { wait(mutex); critical section

signal(mutex); remainder section

} while (1);


Shared Data Protection by Semaphores

88


POSIX Pthreads

89


POSIX Pthreads

90

• pthread_mutex_init()

• Initialize new mutex, which is unlocked by default

• pthread_mutex_lock(), pthread_mutex_trylock()

• Blocking / non-blocking wait for a mutex lock

• pthread_mutex_unlock()

• Operating system scheduling decides about wake-up preference

• Focus on speed of operation, no deadlock or starvation protection mechanism

int pthread_mutex_lock(pthread_mutex_t *mutex); int pthread_mutex_trylock(pthread_mutex_t *mutex); int pthread_mutex_unlock(pthread_mutex_t *mutex);


Spinlocks

91

Processor'B'Processor'A'

do#####acquire_spinlock(DPC)#un6l#(SUCCESS)##begin#####remove#DPC#from#queue#end##release_spinlock(DPC)#

do#####acquire_spinlock(DPC)#un6l#(SUCCESS)##begin#####remove#DPC#from#queue#end##release_spinlock(DPC)#

.#

.#

.#

.#

.#

.#

Cri6cal#sec6on#

spinlock#

DPC# DPC#

Memory Management


Memory Management - Address Space

• CPU fetches instructions from memory according to the program counter value

• Instructions may cause additional loading from and storing to memory locations

• Address space: Set of unique location identifiers (addresses)

• Memory address regions that are available to the program at run-time

• All systems today work with a contiguous / linear address space per process

• In a concurrent system, address spaces must be isolated from each other-> mapping of address spaces to physical memory

• Mapping approach is predefined by the hardware / operating system combination

• Not every mapping model works on all hardware

• Most systems today implement a virtual address space per process

93


Linear Address Space

94

(from

IBM

dev

elope

rWor

ks)

Lineare Adreßräume (2)

Memory

Address Space 1

Address Space 2

AddressSpace 3

AddressSpace 3

Address Space 1

Address Space 2


Memory Management Unit (MMU)

• Hardware device that maps logical to physical addresses

• The MMU is part of the processor

• Re-programming the MMU is a privileged operation, can only be performed in privileged (kernel) mode

• The MMU typically implements one or moremapping approaches

• The user program deals with logical addresses only

• Never sees the real physical addresses

• Transparent translation with each instructionexecuted

95


Memory Hierarchy

96

http

://tjli

u.m

yweb

.hine

t.net

/

• The operating system has to manage the memory hierarchy

• Programs should have comparable performance on different memory architectures

• In some systems, parts of the cache invalidation are a software task (e.g. TLB)


Memory Hierarchy

• Available main and secondary memory is a shared resource among all processes

• Can be allocated and released by operating system and application

• Programmers are not aware of other processes in the same system

• Main memory is expensive, volatile and fast, good for short-term usage

• Secondary memory is cheaper, typically not volatile and slower, good for long-term

• Flow between levels in the memory hierarchy is necessary for performance

• Traditionally solved by overlaying and swapping

• Reoccurring task for software developers - delegation to operating system

• In multiprogramming, this becomes a must

97


Swapping• In a multiprogramming environment

• Blocked (and ready) processes can be temporarily swapped out of main to secondary memory

• Allows for execution of other processes

• With physical addresses

• Processes will be swapped in into same memory space that they occupied previously

• With logical addresses

• Processes can be swapped in at arbitrary physical addresses

• Demands relocation support98

Opera&ng)system)

User))space)

Process)P1)

Process)P2)

Swap)out)

Swap)in)

Main)memory)

Backing)store)


Memory Management - Partitioning

• With relocation and isolation, the operating system can manage memory partitions

• Reserve memory partitions on request

• Recycle unused / no longer used memory partitions, implicitly or explicitly

• Swap out the content of temporarily unused memory partitions

• Memory management must keep state per memory partition

• Different partitioning approaches have different properties

• Traditional approach was one partition per process

99

Lineare Adreßräume (2)

Memory

Address Space 1

Address Space 2

AddressSpace 3

AddressSpace 3

Address Space 1

Address Space 2

• Partitioning approaches can be evaluated by their

• Fragmentation behavior, performance, overhead

• Hypothetical example: Fixed partition size, bit mask for partition state

• Small block size -> large bit mask -> small fragmentationLarge block size -> small bit mask -> large fragmentation

• External Fragmentation

• Total memory space exists to satisfy a request, but it is not contiguous

• Internal Fragmentation

• Allocated memory may be slightly larger than requested memory

• Size difference is memory internal to a partition, but not being used


Memory Management - Partitioning

100


Memory Partitioning - Dynamic Partitioning• External fragmentation can be overcome

by compaction

• Operating system is shifting partitions so that free memory becomes one block

• Time investment vs. performance gain

• Demands relocation support

• Placement algorithms for unequal fixedand dynamic partitioning

• Best-fit: Chose the partition that is closest in size

• First-fit: Pick first partition that is large enough

• Next-fit: Start check from the last chosen partition and pick next match

101

Example: 16MB allocation


Compaction

• Combine adjacent free memory regions to a larger region

• Removes external fragmentation

• Can be performed ...

• ... with each memory de-allocation

• ... when the system in inactive

• ... if an allocation request fails

• ... if a process terminates

• Compaction demands support for relocation

102

Kompaktifizierung (1)

. . . muß spätestens ausgelöst werden, wenn eine Anforderung nicht mehr erfüllbar ist

job 1

job 2

OS

0

300

500600

2100

job 3

job 4

800

1200

job 1

job 2

OS

0

300

500600

2100

1200

job 4

job 31000

job 1

job 4

job 3

job 2

OS

0

300

500600

1000

1200

1500

2100

1900

job 1

job 4

job 2

OS

0

300

500600

1500

2100

1900job 3

600 Worte bewegt 400 Worte bewegtAusgangspunkt 200 Worte bewegt

• Segmentation:

• Split process address space into segments

• Variable length up to a given maximum

• Like dynamic partitioning, but

• Partitions don‘t need to be contiguous - no internal fragmentation

• External fragmentation is reduced with multiple partitions per process

• Large segments can be used for process isolation (like in the partitioning idea)

• Mid-size segments can be used for separating application code, libraries and stack

• Small segments can be used for object and record management

• Each logical memory address is a tuple of segment number and segment-relative address (offset)

• Translated to base / limit values by the MMUOperating Systems I PT / FF 14

Partitioning by Segmentation

103


Segmentation - Linux Example

104

Interne Organisation

Address Space

Code

Data

Stack

Data

CodeData

Data

Code

Code

Data

Code

Code

Code

Heap

Adreßräume in Unix-Derivaten

Expansion Area

Code

End of Data(break)

Top of Stack

Stack

Start of Address Space

End of Address Space

Initialised Data

BSS(zeroed)

Unix

BSS = „Block Started By Symbol“ -> non-initialized static and

global variables

(C) J. Nolte, BTU


Memory Protection - Bounds / Limit

• Every process has different limit, either base/limit pair or bounds pair

• Processor has only one valid configuration at a time

• Operating system manages limits as part of the processor context per process

105

Schranken (4)

unused

user 2

user 1

base

hardware prototype

base

base

base

operating system

software prototypes

limit limit

limit

limit

(C) J. Nolte, BTU


Segmentation Granularity

106

Mittelgranulare Segmentierung (1)

Code 0Module 0

Module 2

Module 1 Code 1

Code 2

Data 2

Data 0

Data 1

Data 1

Code 1

Data 0

Data 2

Code 2

logical physical

base/limit

base/limit

base/limit

base/limit

base/limit

base/limit

Code 0

(C) J. Nolte, BTU


Segment Tables

• With multiple base/limit pairs per process, a segment table must be maintained

• Table is in main memory, but must be evaluated by the MMU

107


Memory Management - Paging• Segmentation / partitioning always have a fragmentation problem

• Fixed-size partitions lead to internal fragmentation

• Variable-sized partitions lead to external fragmentation

• Solution: Paging

• Partition memory into small equal fixed-size chunks - (page) frames

• Partition process address space into chunks of the same size - pages

• No external fragmentation, only small internal fragmentation in the last page

• One page table per process

• Maps each process page to a frame - entries for all pages needed

• Used by processor MMU do translate logical to physical addresses

108

109


Memory Management - Paging

110


Page Table Sizes

111

Address Space Page Size Number of Pages Page Table Size per Process

2 2 2 2

2 2 2 2

2 2 2 256 GB

2 2 2 256 MB

2 2 2 16.7 PB

2 2 2 16 TB


Protection and Sharing

• Logical addressing with paging allows sharing of address space regions

• Shared code - multiple program instances, libraries, operating system code

• Shared data - concurrent applications, inter-process communication

• Protection based on paging mechanisms

• Individual rights per page maintained in the page table (read, write, execute)

• On violation, the MMU triggers a processor exception (trap)

• Address space often has unused holes (e.g. between stack and heap)

• If neither process nor operating system allocated this region, it is marked as invalid in the page table

• On access, the processor traps

112


Th NX Bit

• The Never eXecute bit marks a page as not executable

• Very good protection against stack or heap-based overflow attacks

• Well-known for decades in non-X86 processor architectures

• AMD decided to add it to the AMD64 instruction set, Intel adopted it since P4

• Demands operating system support for new page table structure

• Support in all recent Windows and Linux versions

113


Page Swapping

114

/��B:��

��1

��0

��1

��(

��)

��0<

��1

��0D

��1

��)0

��1

��1

��)2

��1

��0

��1

��)

��(

$�$�$

$�$�$

��)1��))��0?��)(

��

��

��&

��1

��0

��1

��(

��)

�� &��&

�� &��:��


No Swapping ?

• Some sources argue that systems without swapping perform better

• Some counter-arguments:

• Swapping removes information only used once from main memory

• Initialization code or dead code

• Event-driven code that may never be triggered in the current system run

• Constant data

• Resources being loaded on start-up

• Extra memory generated by swapping is typically used for the file system cache

• Operating systems are heavily optimized for not swapping the wrong pages

• Memory ,honks‘ would get an unfair advantage in system resource usage

115

Scheduling


Scheduling

• Assign activities (processes / threads) to processor(s)

• System objectives to be considered; Response time, throughput, efficiency, ...

• Long-term scheduling: Decision to add a process to the pool of executed processes

• Example: Transition of a new process into „ready“ state; batch processing queue

• Medium-term scheduling: Decision to load process into memory for execution

• Example: Resume suspended processes from backing store

• Short-term scheduling: Decision which particular ready process will be executed

• Example: Move a process from „ready“ state into „running“ state

• I/O scheduling: Decision which process is allowed to perform device activities

• Overall goal is to minimize queuing time for all processes

117


Short-Term Scheduler

• In cooperation with the dispatcher as part of the core operating system function

• Frequent fine-grained decision about what runs next, happens on:

• Clock interrupt (regular scheduling interval)

• I/O interrupts

• Operating system calls

• Signals

• Any event that blocks the currently running process / thread

• Needs decision criteria to choose the next

• User perspective vs. system perspective

118


CPU and I/O Bursts

• Processes / threads can be described as either:

• I/O-bound – spends more time doing I/O than computations, many short CPU bursts

• Compute-bound – spends more time doing computations, few very long CPU bursts

• Behavior can change during run time

• Many short CPU bursts are typical

119

!!!!!!!!…!load!val!inc!val!read!file!

wait!for!I/O!

inc!count!add!data,!val!write!file!

wait!for!I/O!

load!val!inc!val!read!from!file!

wait!for!I/O!

…!

CPU!burst!

CPU!burst!

CPU!burst!

I/O!burst!

I/O!burst!

I/O!burst!

Burst&dura)on&(msec)&0& 10& 20& 30&

distrib

u)on

&


Round Robin

• Uses preemption based on a clock interrupt, manage „ready“ processes in a queue

• Also known as time slicing - each process get‘s a time quantum

• Particularly effective in time-sharing system or transaction processing system

• Compute-bound processes are favored over I/O bound processes in mixed load

• I/O wait delays the move-back to the „ready“ list

• Better for short jobs in comparison to FCFS

• Very short quantum brings overhead penalty, typical lower limit of 10ms

120


Multilevel Queue Scheduling

• Ready queue is partitioned into separate queues

• Real-time (system, multimedia) and Interactive

• Queues may have different scheduling algorithms

• Real-Time – Round Robin

• Interactive – Round Robin + priority-elevation + quantum stretching

• Scheduling must be done between the queues

• Fixed priority scheduling (i.e., serve all real-time threads then from interactive)

• Possibility of starvation

• Time slice – each queue gets a certain amount of CPU time which it can schedule

• Established approach in Solaris operating system family

121


Windows Scheduling Principles

• 32 priority levels

• Threads within same priority are scheduled following round robin policy

• Realtime priorities (i.e.; > 15) are assigned statically to threads

• Non-realtime priorities are adjusted dynamically

• Priority elevation as response to certain I/O and dispatch events

• Quantum stretching to optimize responsiveness

• In multiprocessor systems, affinity mask is considered

• No attempt to share processors fairly among processes, only among threads

122

6

N-.#-0K((L6.-/$(G."%.",4(d-<-0'(

$%&'()*+,-.)/&+)0)+1&

$2&0*(3*4+)&+)0)+1&

51)6&47&8)("&9*:)&#;()*6&

51)6&47&36+)&#;()*6<1=&

>$&

$%&

&?&

&!"

$2&

&$&


Multiprocessor Systems

• Threads can run on any CPU, unless specified otherwise

• Scheduling tries to keep threads on same CPU (soft affinity)

• Threads can be bound to particular CPUs (hard affinity)

• SetThreadAffinityMask, SetProcessAffinityMask, SetInformationJobObject

• Bit mask where each bit corresponds to a CPU number

• Thread affinity mask must be a subset of process affinity mask, which must be a subset of the active processor mask and may be derived from the image affinity mask, if given

• The scheduling code runs fully distributed, no ,master‘ processor

• Any processor can interrupt another processor to schedule a thread

• Scheduling database as per-CPU data structure of ready queues

123


Windows vs. Kernel Priorities

124


Special Thread Priorities

• One idle thread per CPU

• When no threads want to run, idle thread is executed

• Appears to have priority zero, but actually runs “below” priority 0

• Provides CPU idle time accounting - unused clock ticks are charged to idle thread

• Loop:

• Calls HAL to allow for power management, processes DPC list

• Dispatches to a thread if selected

• One zero page thread per system

• Zeroes pages of memory in anticipation of “demand zero” page faults

• Runs at priority zero (lower than reachable with Windows API) in the „system“ process

125


Scheduling Scenarios

• Preemption

• A thread becomes ready at a higher priority than the currently running thread

• The lower-priority running thread is preempted

• The preempted thread goes back to the head of its ready queue

• Scheduler needs to pick the lowest priority thread to preempt

• Preemption is strictly event-driven, does not wait for the next clock tick

• Threads in kernel mode may be preempted (unless they raise IRQL to >= 2)

126


Priority Adjustments• Dynamic priority adjustments are applied to threads in dynamic classes

• Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost

• Types of priority adjustment

• I/O completion

• Wait completion on executive events or semaphores

• When threads in the foreground process complete a wait operation

• Boost value of 2, lost after one full quantum

• Quantum decremented by 1 so that threads that get boosted after I/O completion won't keep running and never experiencing quantum end

• GUI threads that wake up to windowing input (e.g. messages) get a boost of 2

• Added the current priority, not the base priority

127


Priority Adjustments

• No automatic adjustments in real-time class (16 or above)

• Real time here really means “system won’t change the relative priorities of your real-time threads”

• Hence, scheduling is predictable with respect to other “real-time” threads,but not for absolute latency

• Example: Boost on I/O completion

• Specified by the device driver through IoCompleteRequest(Irp, PriorityBoost)

• Common boost values (see NTDDK.H): 1 - disk, CD-ROM, parallel, video ;2 - serial, network, named pipe, mailslot ; 6 - keyboard or mouse ;8 - sound

128


New since Windows 7

• Core Parking

• Historically, CPU workload was distributed fairly evenly across logical processors,even on low utilization

• Core Parking tries to keep the load on fewest logical processors possible,all others can sleep; only overridden by hard affinity and thread ideal processor

• Power management code notifies scheduling code about parked cores

• Considers socket topology - newer processors put sockets into deep sleep if all the cores are idle

• At least one CPU in each NUMA node is left unparked for fast memory access

• Core Parking is active on server and hyperthreading systems

• Best returns on medium utilization workloads, but typical Desktop client systems tend to run at extremes

129


Unix SVR4 Scheduling

• Differentiation between different three priority classes for 160 priority levels

• Real-time processes (159-100)

• kernel-mode processes (99-60)

• time-shared processes (59-0, user mode)

• Kernel was not preemptible, so specific preemption points were defined

• Region of code where all kernel data structures are either updated and consistent, or locked via a semaphore

• One dispatch queue per priority level, each handled in round-robin

• Each time a time-shared process used a quantum, its priority is decreased

• Each time it blocks on an event or resource, its priority is increased

• Time-shared process quantum depends on priority, fixed for real-time processes130


Linux Scheduling• schedule function as central

organization point for scheduling

• Runtime of the scheduler became thread-count-independent with Linux 2.6 - O(1) scheduler

• Also established for a while in BSD and Windows NT kernels

• Internal priorities: real-time processes (0-99), regular processes (100-139)

• nice system call allows to modify the static priority between -20 and +19(less means higher priority)

131

I/O

Operating Systems I PT/FF 14

Input / Output Devices

• Hardware devices engaged in I/O can be categorized into:

• Responsible for interaction with the user

• Printers, terminals, video display, keyboard, mouse

• Provisioning of system-local hardware functionality

• Disk drives, USB keys, sensors, controllers

• Provisioning of communication support

• Modems, WLAN stick, network card

• Devices differ according to multiple factors

• Data transfer rates, complexity of control, data encoding, error conditions, ...

• I/O devices either operate as block device (fixed-size data blocks) or character / stream device (stream of bytes)

133


I/O Functionality

• I/O functionality in an operating system can take place as:

• Programmed I/O with polling

• Process issues I/O command indirectly through the operating system

• Busy waits for completion

• Interrupt-driven I/O

• Process issues I/O command indirectly through the operating system

• Operating system blocks the process until an interrupt signals I/O completion

• With non-blocking I/O, the process continues to work instead of being blocked

• In both cases, the processor must fetch I/O read results from the device

• Alternative: Direct Memory Access (DMA)

134


Direct Memory Access (DMA)

• Special hardware unit realizes the data transfer between device and main memory

135


Device Drivers

• Lowest level of software to interact with I/O hardware

• Contains all device-dependent code for one type of hardware unit

• Direct communication with the device hardware

• Writes control registers, accesses hardware-mapped memory regions

• Starting and completion of I/O operations

• Loaded drivers become part of the running operating system image

• User mode applications deal with logical I/O

• Processes are enabled to access data records in files

• Driver may block after issuing a request, un-blocked by device interrupt

136


Layers of the I/O System

• Device-independent software layer combines common management tasks

• Uniform interfacing for device interaction (e.g. read / write / create)

• Device naming

• Device protection

• Buffering

• Storage allocation on block devices

• Allocation and releasing

• Error reporting

137

User process

Device-independent software layer

Device driver layer

Interrupt handlers

Hardware


Buffering

• Typical optimization is buffering of I/O requests

• Perform input transfers in advance of requests being made

• Perform output transfers delayed, after the request is made

138


Buffering

• No buffer - Operating system directly accesses the device

• Single buffer - Operating system assigns buffer in main memory for I/O request

• For block-oriented devices, read-ahead may manage to prepare blocks in memory that are fetched later by the process

• For stream-oriented devices, read-ahead is based on the notion of text lines (terminals) or bytes (terminals, video stream)

• Swapping logic of the operating system is affected

• Double buffer / buffer swapping - Process can transfer data from / to one buffer while the operating system empties or fills the other buffer

• Circular buffer - Two or more buffers are used in a circular fashion

• Buffering smoothes out peaks in I/O demand, less advantage under heavy load

139


Windows - I/O Architecture

• I/O Manager

• Connects applications and components to virtual, logical, and physical devices

• Windows APIs: ReadFile, WriteFile, CreateFile, CloseFile, DeviceIoControl

• Defines the infrastructure that supports device drivers

• Manages buffers for I/O requests

• Provides time-out support for drivers

• Knows which installable file systems are loaded

• Provides flexible I/O services for environment subsystems

• Framework for delivery of I/O request packets (IRPs)

140


Driver Layering

• Windows drivers can be stacked to add functionality

• Only the lowest layer talks to the hardware

• Filter drivers attach their devices to other devices

• See‘s all requests first an can manipulate them

• Examples: File replication, file encryption, shadow copies, licensing, virus scanner

141


Windows - IRPs

• I/O manager creates an IRP for each I/O operation and passes it to correct drivers

• Deletes IRP when I/O operation is complete

• Device drivers

• Receive IRP routed to them by the I/O manager and performs the operation

• Inform the I/O manager when those commands complete by passing back the IRP

• Often use the I/O manager to forward IRPs to other device drivers

• Fast I/O

• Bypass generation of IRPs, go directly to file system driver or cache manager

• Scatter/Gather I/O

• Read/write multiple buffers with a single system call

142

Operating Systems 1 (12/12) - Summary

Education

processing unit

wooden punch cards

basic ibm

batch processing ibm

card read1402 card

computer bug

electronic computer

programmable digital