Processes in Unix and Nachos Processes in Unix and Nachos
Processes in Unix and NachosProcesses in Unix and Nachos
Elements of the Unix Process and I/O ModelElements of the Unix Process and I/O Model
1. rich model for IPC and I/O: “everything is a file”file descriptors: most/all interactions with the outside
world are through system calls to read/write from file descriptors, with a unified set of syscalls for operating on open descriptors of different types.
2. simple and powerful primitives for creating and initializing child processes
fork: easy to use, expensive to implement
3. general support for combining small simple programs to perform complex tasks
standard I/O and pipelines: good programs don’t know/care where their input comes from or where their output goes
Unix File DescriptorsUnix File Descriptors
Unix processes name I/O and IPC objects by integers known as file descriptors. • File descriptors 0, 1, and 2 are reserved by convention
for standard input, standard output, and standard error.“Conforming” Unix programs read input from stdin, write
output to stdout, and errors to stderr by default.
• Other descriptors are assigned by syscalls to open/create files, create pipes, or bind to devices or network sockets.pipe, socket, open, creat
• A common set of syscalls operate on open file descriptors independent of their underlying types.read, write, dup, close
Unix File Descriptors IllustratedUnix File Descriptors Illustrated
user space
File descriptors are a special case of kernel object handles.
pipe
file
socket
process filedescriptor
table
kernel
system open file table
tty
Disclaimer: this drawing is
oversimplified.
The binding of file descriptors to objects is specific to each process, like the virtual translations in the virtual address space.
The Concept of ForkThe Concept of Fork
The Unix system call for process creation is called fork().
The fork system call creates a child process that is a clone of the parent.
• Child has a (virtual) copy of the parent’s virtual memory.
• Child is running the same program as the parent.
• Child inherits open file descriptors from the parent.
(Parent and child file descriptors point to a common entry in the system open file table.)
• Child begins life with the same register values as parent.
The child process may execute a different program in its context with a separate exec() system call.
Unix Unix Fork/Exec/Exit/WaitFork/Exec/Exit/Wait Example Example
fork parent fork child
wait exit
int pid = fork();Create a new process that is a clone of its parent.
exec*(“program” [, argvp, envp]);Overlay the calling process virtual memory with a new program, and transfer control to it.
exit(status);Exit with status, destroying the process.
int pid = wait*(&status);Wait for exit (or other status change) of a child.
exec initialize child context
Example: Process Creation in UnixExample: Process Creation in Unix
int pid;int status = 0;
if (pid = fork()) {/* parent */…..pid = wait(&status);
} else {/* child */…..exit(status);
}
Parent uses wait to sleep until the child exits; wait returns child pid and status.
Wait variants allow wait on a specific child, or notification of stops and other signals.
The fork syscall returns twice: it returns a zero to the child and the child process ID (pid) to the parent.
What’s So Cool About What’s So Cool About ForkFork
1. fork is a simple primitive that allows process creation without troubling with what program to run, args, etc.
Serves some of the same purposes as threads.
2. fork gives the parent program an opportunity to initialize the child process…especially the open file descriptors.
Unix syscalls for file descriptors operate on the current process.
Parent program running in child process context may open/close I/O and IPC objects, and bind them to stdin, stdout, and stderr.
Also may modify environment variables, arguments, etc.
3. Using the common fork/exec sequence, the parent (e.g., a command interpreter or shell) can transparently cause children to read/write from files, terminal windows, network connections, pipes, etc.
Producer/Consumer PipesProducer/Consumer Pipes
outputinput
char inbuffer[1024];char outbuffer[1024];
while (inbytes != 0) { inbytes = read(stdin, inbuffer, 1024); outbytes = process data from inbuffer to outbuffer; write(stdout, outbuffer, outbytes);}
Pipes support a simple form of parallelism with built-in flow control.
e.g.: sort <grades | grep Dan | mail sprenkle
Unix as an Extensible SystemUnix as an Extensible System
“Complex software systems should be built incrementally from components.”
• independently developed
• replaceable, interchangeable, adaptable
The power of fork/exec/exit/wait makes Unix highly flexible/extensible...at the application level.
• write small, general programs and string them together
general stream model of communication
• this is one reason Unix has survived
These system calls are also powerful enough to implement powerful command interpreters (shell).
The ShellThe Shell
The Unix command interpreters run as ordinary user processes with no special privilege.
This was novel at the time Unix was created: other systems viewed the command interpreter as a trusted part of the OS.
Users may select from a range of interpreter programs available, or even write their own (to add to the confusion).
csh, sh, ksh, tcsh, bash: choose your flavor...or use perl.
Shells use fork/exec/exit/wait to execute commands composed of program filenames, args, and I/O redirection symbols.
Shells are general enough to run files of commands (scripts) for more complex tasks, e.g., by redirecting shell’s stdin.
Shell’s behavior is guided by environment variables.
Limitations of the Unix Process ModelLimitations of the Unix Process Model
The pure Unix model has several shortcomings/limitations:• Any setup for a new process must be done in its context.
• Separated Fork/Exec is slow and/or complex to implement.
A more flexible process abstraction would expand the ability of a process to manage another externally.
This is a hallmark of systems that support multiple operating system “personalities” (e.g., NT) and “microkernel” systems (e.g., Mach).
Pipes are limited to transferring linear byte streams between a pair of processes with a common ancestor.
Richer IPC models are needed for complex software systems built as collections of separate programs.
Two Views of Threads in NachosTwo Views of Threads in Nachos
1. Nachos is a thread library running inside a Unix (Solaris) process, with no involvement from the kernel.
SPARC interrupts and Solaris timeslicing are invisible.
the Nachos scheduler does its own pseudo-random timeslicing.
2. Nachos is a toolkit for building a simulated OS kernel.Threads are a basis for implementing Nachos processes;
when running in kernel mode they interact/synchronize as threads.
Nachos kernel’s timeslicing is implemented in the scheduler.
- driven by timer interrupts on the “simulated machine”
A Nachos kernel could provide a kernel interface for threads.
Nachos Thread States and TransitionsNachos Thread States and Transitions
running(user)
running(kernel)
readyblocked
Scheduler::Run
Scheduler::ReadyToRun
interrupt orexception
Thread::Sleep
Thread::Yield
Machine::Run,ExceptionHandler
When running in user mode, the thread executes within the SPIM machine simulator.
In Labs 1-3 we are only concerned with the states in this box.
A Simple Page TableA Simple Page Table
PFN 0PFN 1
PFN i
page #i offset
user virtual address
PFN i+
offset
process page table
physical memorypage frames
In this example, each VPN j maps to PFN j,
but in practice any physical frame may be
used for any virtual page.
Each process/VAS has its own page table.
Virtual addresses are translated relative to
the current page table.
The page tables are themselves stored in memory; a protected
register holds a pointer to the current page table.
Nachos: A Peek Under the HoodNachos: A Peek Under the Hood
data datauser space
MIPS instructionsexecuted by SPIM
Nachos kernel
SPIMMIPS emulator
shell cp
Machineobject
fetch/executeexamine/deposit
SaveState/RestoreStateexamine/deposit
Machine::Run()
ExceptionHandler()
SP
Rn
PC
registers memory
page table
process page tables
The User-Mode Context for NachosThe User-Mode Context for Nachos
text
data
BSS
user stack
args/env
data
PFN 0PFN 1
PFN i
page #i offset
user virtual address
PFN i+
offset
boolean Machine::Translate(uva, alignment, &kva)Translate user virtual address to a kernel memory address, checking access and alignment.
SP
Rn
PC
registers
void StartProcess(char *filename) { OpenFile *executable; AddrSpace *space;
executable = fileSystem->Open(filename); if (executable == NULL) { printf("Unable to open file %s\n", filename); return; } space = new AddrSpace(executable); currentThread->space = space; delete executable; // close file space->InitRegisters(); space->RestoreState(); machine->Run(); ASSERT(FALSE);}
Creating a Nachos ProcessCreating a Nachos Process
Create an AddrSpace object, allocating physicalmemory and setting up the process page table.
Set address space of current thread/process.
Initialize registers and begin execution in usermode.
Create a handle for reading text and initial data out of the executable file.
AddrSpace::AddrSpace(OpenFile *executable) { NoffHeader noffH; unsigned int i, size; executable->ReadAt((char *)&noffH, sizeof(noffH), 0);
// how big is address space? size = noffH.code.size + noffH.initData.size + noffH.uninitData.size + UserStackSize; // we need to increase the size to leave room for the stack numPages = divRoundUp(size, PageSize); size = numPages * PageSize;
pageTable = new TranslationEntry[numPages]; for (i = 0; i < numPages; i++) { pageTable[i].virtualPage = i; // for now, virtual page # = phys page # pageTable[i].physicalPage = i; pageTable[i].valid = TRUE; } ....
Creating a Nachos Address SpaceCreating a Nachos Address Space
bzero(machine->mainMemory, size);
// copy in the code and data segments into memory if (noffH.code.size > 0) { noffH.code.virtualAddr, noffH.code.size); executable->ReadAt(&(machine->mainMemory[noffH.code.virtualAddr]), noffH.code.size, noffH.code.inFileAddr); }
if (noffH.initData.size > 0) { noffH.initData.virtualAddr, noffH.initData.size); executable->ReadAt(&(machine->mainMemory[noffH.initData.virtualAddr]), noffH.initData.size, noffH.initData.inFileAddr); }
Initializing a Nachos Address SpaceInitializing a Nachos Address Space
Join ScenariosJoin Scenarios
Several cases must be considered for join (e.g., exit/wait).
• What if the child exits before the parent joins?
“Zombie” process object holds child status and stats.
• What if the parent continues to run but never joins?
How not to fill up memory with zombie processes?
• What if the parent exits before the child?
Orphans become children of init (process 1).
• What if the parent can’t afford to get “stuck” on a join?
Unix makes provisions for asynchronous notification.
Review: The Virtual Address SpaceReview: The Virtual Address Space
A typical process VAS space includes:
• user regions in the lower half
V->P mappings specific to each process
accessible to user or kernel code
• kernel regions in upper half
shared by all processes
accessible only to kernel code
• Nachos: process virtual address space includes only user portions.
mappings change on each process switch
text
data
BSS
user stack
args/env
0
data
kernel textand
kernel data
2n-1
2n-1
0x0
0xffffffff
A VAS for a private address space system (e.g., Unix) executing on a typical 32-bit architecture.
Process InternalsProcess Internals
+ +user ID
process IDparent PID
sibling linkschildren
virtual address space process descriptor
resources
thread
stack
Each process has a thread bound to the VAS.
The thread has a saved user context as well as a system
context.
The kernel can manipulate the user context to start the
thread in user mode wherever it wants.
Process state includes a file descriptor table, links to maintain the process tree, and a
place to store the exit status.
The address space is represented by page
table, a set of translations to physical
memory allocated from a kernel memory manager.
The kernel must initialize the process
memory with the program image to run.
What’s in an Object File or Executable?What’s in an Object File or Executable?
int j = 327;char* s = “hello\n”;char sbuf[512];
int p() { int k = 0; j = write(1, s, 6); return(j);}
text
dataidata
wdata
header
symboltable
relocationrecords
Used by linker; may be removed after final link step and strip.
Header “magic number”indicates type of image.
Section table an arrayof (offset, len, startVA)
program sections
program instructionsp
immutable data (constants)“hello\n”
writable global/static dataj, s
j, s ,p,sbuf
The Birth of a ProgramThe Birth of a Program
int j;char* s = “hello\n”;
int p() { j = write(1, s, 6); return(j);}
myprogram.c
compiler
…..p: store this store that push jsr _write ret etc.
myprogram.s
assembler data
myprogram.o
linker
object file
data program
(executable file)myprogram
datadatadata
libraries and other
objects
The Program and the Process VASThe Program and the Process VAS
text
dataidatawdata
header
symboltable
relocationrecords
program
text
data
BSS
user stack
args/envkernel
data
process VAS
sections segments
BSS“Block Started by Symbol”(uninitialized global data)e.g., heap and sbuf go here.
Args/env strings copied in by kernel when the process is created.
Process text segment is initialized directly from program text
section.
Process data segment(s) are
initialized from idata and wdata sections.
Process stack and BSS (e.g., heap) segment(s) are
zero-filled.
Process BSS segment may be expanded at runtime with a system call (e.g., Unix sbrk) called by the heap manager
routines.
Text and idata segments may be write-protected.
Processes and the KernelProcesses and the Kernel
text
data
BSS
user stack
args/envkernel area
0
data
2n-1
kernel2n-1
2n-1
text
data
BSS
user stack
args/envkernel area
0
data
text
data
BSS
user stack
args/envkernel area
0
data
Nachos as a Thread LibraryNachos as a Thread Library
The Nachos library implements concurrent threads.
• no special support needed from the kernel (use any Unix)
• thread creation and context switch are fast (no syscall)
• defines its own thread model and scheduling policies
• library threads are sometimes called coroutines, lightweight threads, or fibers in NT.
readyList
data while(1) { t = scheduler->FindNextToRun(); scheduler->Run(t);}
Fork/Exit/Wait ExampleFork/Exit/Wait Example
OS resources
fork parent fork child
wait exit
Child process starts as clone of parent: increment refcounts on shared resources.
Parent and child execute independently: memory states and resources may diverge.
On exit, release memory and decrement refcounts on shared resources.
Child enters zombie state: process is dead and most resources are released, but process descriptor remains until parent reaps exit status via wait.
Parent sleeps in wait until child stops or exits.
“join”
Sharing Open File InstancesSharing Open File Instances
shared seek offset in shared file table entry
system open file table
user IDprocess ID
process group IDparent PIDsignal state
siblingschildren
user IDprocess ID
process group IDparent PIDsignal state
siblingschildren
process file descriptorsprocess
objects
shared file(inode or vnode)
child
parent
File Sharing Between Parent/ChildFile Sharing Between Parent/Child
main(int argc, char *argv[]) {char c;int fdrd, fdwt;
if ((fdrd = open(argv[1], O_RDONLY)) == -1)exit(1);
if ((fdwt = creat([argv[2], 0666)) == -1)exit(1);
fork();
for (;;) {if (read(fdrd, &c, 1) != 1)
exit(0);write(fdwt, &c, 1);
}}
[Bach]
Join ScenariosJoin Scenarios
Several cases must be considered for join (e.g., exit/wait).
• What if the child exits before the parent joins?
“Zombie” process object holds child status and stats.
• What if the parent can’t afford to get “stuck” on a join?
Unix provides for asynchronous notification via SIGCHLD.
• What if the parent exits before the child?
Orphans become children of init (process 1).
• What if the parent continues to run but never joins?
How not to fill up memory with zombie processes?
(Don’t create zombies if SIGCHLD ignored.)
Unix Process StatesUnix Process States
runuser
runkernel
readyblocked
run
wakeup
trap/fault
sleep
kernelinterrupt
interrupt
preempted
suspend/run
new
fork
zombie exit
swapout/swapin swapout/swapin
(suspend)
Example: Unix SignalsExample: Unix Signals
Unix systems can notify a user program of a fault with a signal.The system defines a fixed set of signal types (e.g.,
SIGSEGV, SIGBUS, etc.).
A user program may choose to catch some signal types, using a syscall to specify a (user mode) signal handler procedure.
system passes interrupted context to handler
handler may munge and/or return to interrupted context
Signals are also used for other forms of asynchronous event notifications.
E.g., a process may request a SIGALARM after some interval has passed, or signal another process using the kill syscall or command.
Unix Signals 101Unix Signals 101
Signals notify processes of internal or external events.
• the Unix software equivalent of interrupts/exceptions
• only way to do something to a process “from the outside”
• Unix systems define a small set of signal types
Examples of signal generation:
• keyboard ctrl-c and ctrl-z signal the foreground process
• synchronous fault notifications, syscall errors
• asynchronous notifications from other processes via kill
• IPC events (SIGPIPE, SIGCHLD)
• alarm notifications signal == “upcall”
Process Handling of SignalsProcess Handling of Signals
1. Each signal type has a system-defined default action.abort and dump core (SIGSEGV, SIGBUS, etc.)
ignore, stop, exit, continue
2. A process may choose to block (inhibit) or ignore some signal types.
3. The process may choose to catch some signal types by specifying a (user mode) handler procedure.
specify alternate signal stack for handler to run on
system passes interrupted context to handler
handler may munge and/or return to interrupted context
Delivering SignalsDelivering Signals
1. Signal delivery code always runs in the process context.
2. All processes have a trampoline instruction sequence installed in user-accessible memory.
3. Kernel delivers a signal by doctoring user context state to enter user mode in the trampoline sequence.
First copies the trampoline stack frame out to the signal stack.
4. Trampoline sequence invokes the signal handler.
5. If the handler returns, trampoline returns control to kernel via sigreturn system call.
Handler gets a sigcontext (machine state) as an arg; handler may modify the context before returning from the signal.
When to Deliver Signals?When to Deliver Signals?
runuser
runkernel
readyblocked
run
wakeup
trap/fault
sleep
preempted
suspend/run
new
fork
zombie exit
swapout/swapin swapout/swapin
(suspend)
Interrupt low-priority sleep if signal is posted.
Check for postedsignals after wakeup.
Deliver signals when resuming to user mode.
Deliver signals when returning to user mode from trap/fault.
Questions About SignalsQuestions About Signals
1. What if handler corrupts the sigcontext before sigreturn?
2. What is a process signal handler...
• makes a system call?
• never returns?
3. What if a process is signalled again while it is executing in a signal handler?
4. How to signal a process sleeping in a system call?
Process Blocking with Sleep/WakeupProcess Blocking with Sleep/Wakeup
A Unix process executing in kernel mode may block by calling the internal sleep() routine.
• wait for a specific event, represented by an address
• kernel suspends execution, switches to another ready process
• wait* is the first example we’ve seen
also: external input, I/O completion, elapsed time, etc.
Another process or interrupt handler may call wakeup (event address) to break the sleep.
• search sleep hash queues for processes waiting on event
• processes marked runnable, placed on internal run queue
Interruptible SleepsInterruptible Sleeps
A Unix process entering a sleep specifies a scheduler priority.
• Determines scheduling priority after wakeup.
• Sleep priority is always higher than base priority.
• Sleeps for internal kernel resources wait at a higher priority.
Low-priority sleeps are interruptible.
• A process in an interruptible sleep may awaken for a signal.
• Interrupted system calls must back out all side effects.
Return errno EINTR….but the system call may be restartable.
• FreeBSD uses tsleep variant for interruptible sleeps.
A process entering tsleep may specify a timeout.
Process GroupsProcess Groups
It is sometimes useful to signal all processes in a group.children of a common parent
• ctrl-c and ctrl-z to a group of children executing together
job control facilities in BSD derivatives
• Kill all children of shell on terminal hangup.
• Kill children of shell if controlling process (shell) dies.
sessions
System calls: setpgrp, killpg.
Controlling ChildrenControlling Children
1. After a fork, the parent program has complete control over the behavior of its child.
2. The child inherits its execution environment from the parent...but the parent program can change it.
• user ID (if superuser), global variables, etc.
• sets bindings of file descriptors with open, close, dup
• pipe sets up data channels between processes
3. Parent program may cause the child to execute a different program, by calling exec* in the child context.
Setting Up PipesSetting Up Pipes
int pfd[2] = {0, 0}; /* pfd[0] is read, pfd[1] is write */int in, out; /* pipeline entrance and exit */
pipe(pfd); /* create pipeline entrance */out = pfd[0]in = pfd[1];
/* loop to create a child and add it to the pipeline */for (i = 1; i < procCount; i++) {
out = setup_child(out);}
/* pipeline is a producer/consumer bounded buffer */write(in, ..., ...);read(out,...,...);
parent
children
in out
Setting Up a Child in a PipelineSetting Up a Child in a Pipeline
int setup_child(int rfd) {int pfd[2] = {0, 0}; /* pfd[0] is read, pfd[1] is write */int i, wfd;
pipe(pfd); /* create right-hand pipe */wfd = pfd[1]; /* this child’s write side */
if (fork()) { /* parent */close(wfd); close(rfd);
} else { /* child */close(pfd[0]); /* close far end of right pipe */close(0, 1);dup(rfd); dup(wfd);close(rfd); close(wfd);...
}return(pfd[0]);
}
rfd wfd
pfd[1]pfd[0]
new childnew right-handpipeline segment
Exec, Execve, etc.Exec, Execve, etc.
Children should have lives of their own.
Exec* “boots” the child with a different executable image.
• parent program makes exec* syscall (in forked child context) to run a program in a new child process
• exec* overlays child process with a new executable image
• restarts in user mode at predetermined entry point (e.g., crt0)
• no return to parent program (it’s gone)
• arguments and environment variables passed in memory
• file descriptors etc. are unchanged
The Program and the Process VASThe Program and the Process VAS
text
dataidatawdata
header
symboltable
relocationrecords
program
text
data
BSS
user stack
args/envkernel u-area
data
sbrk()
jsr
process
sections segments
BSS“Block Started by Symbol”(uninitialized static data)
May be removed afterfinal link step and strip.
Header “magic number”indicates type of image.
Section table an arrayof (offset, len, startVA)
Args/env copied in bykernel on exec.
Linking 101Linking 101
textdataidata
wdata
header
symboltable
relocationrecords
unresolvedexternal
textdataidata
wdata
header
symboltable
relocationrecords
unresolvedexternal
textdataidata
wdata
header
symboltable
relocationrecords
unresolvedexternal
text
dataidata
wdata
header
symboltable
relocationrecords
text
dataidata
wdata
header
symboltable
relocationrecordslink
link
Shared Libraries and DLLsShared Libraries and DLLs
Multiple modulesattached to addressspace with mmap.
textdataidata
wdata
header
symboltable
relocationrecords
textdataidata
wdata
header
symboltable
relocationrecords
BSS
user stack
args/envkernel u-area
text
data
text
data
executableimage
shared libraryor
DLL
loader
Dynamic linker/loaderdynamically imports DLLs.
How to trap references to nonresident symbols?How to address external symbols from a DLL?
Questions for Questions for ExecExec
1. How to copy argv, env?what if the process passes endless strings?
use kernel stack as intermediate buffer?
2. How to effect a return back to user mode after exec?child stack and context
3. What happens when the child returns from main?
4. What about virtual caches?
5. What about shell scripts etc.?