02unixintro

II–1

CS 167 II–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved.

Introduction to Unix

II–2


Outline

• Processes• File Abstraction• Directories• File Representation• File-Oriented System Calls

In this lecture we present a brief introduction to a few of the more important aspects of the Unix operating system. In particular, we look at the Unix notions of processes and files—two concepts that are important throughout the course.

II–3


Processes

• Fundamental abstraction of program execution– memory– processor(s)

- each processor abstraction is a thread– “execution context”

Unix, as do many operating systems, uses the notion of a process as its fundamental abstraction of program execution. Each program runs in a separate process. Processes are protected from one another in the sense that the actions of one process cannot directly harm others. The abstraction comprises the memory of a program (known as its address space—the collection of locations that can be referenced by the process), the execution agents (processor abstractions), and other information, known collectively as the execution context, representing such things as the files the process is currently accessing, how it responds to exceptions, to external stimuli, etc.

The processor abstraction is often called a thread. In “traditional” Unix programs, processes have only one thread, so we’ll use the word process to include the single thread running inside of it. Later in this course, when we cover multithreaded programming, we’ll be more careful and use the word thread when we are discussing the processor abstraction.

II–4


The Unix Address Space

text

data

bss

dynamic

stack

A Unix process’s address space appears to be three regions of memory: a read-only textregion (containing executable code); a read-write region consisting of initialized data (simply called data), uninitialized data (BSS—a directive from an ancient assembler (for the IBM 704 series of computers), standing for Block Started by Symbol and used to reserve space for uninitialized storage), and a dynamic area; and a second read-write region containing the process’s user stack (a standard Unix process contains only one thread of control).

The first area of read-write storage is often collectively called the data region. Its dynamic portion grows in response to sbrk system calls. Most programmers do not use this system call directly, but instead use the malloc and free library routines, which manage the dynamic area and allocate memory when needed by in turn executing sbrk system calls.

The stack region grows implicitly: whenever an attempt is made to reference beyond the current end of stack, the stack is implicitly grown to the new reference. (There are system-wide and per-process limits on the maximum data and stack sizes of processes.)

II–5


Creating a Process: Before

fork( )

parent process

The only way to create a new process is to use the fork system call.

II–6


Creating a Process: After

fork( )// returns p

parent process

fork( )// returns 0

child process (pid = p)

By executing fork the parent process creates an almost exact clone of itself which we call the child process. This new process executes the same text as its parent, but contains a copy of the data and a copy of the stack. This copying of the parent to create the child can be very time-consuming. We discuss later how it is optimized.

Fork is a very unusual system call: one thread of control flows into it but two threads of control flow out of it, each in a separate address space. From the parent’s point of view, fork does very little: nothing happens to the parent except that fork returns the process ID (PID—an integer) of the new process. The new process starts off life by returning from fork. It always views fork as returning a zero.

II–7


Loading a New Image

exec(prog, args)

Before

prog’s text

prog’s dataprog’s bss

args

After

Most of the time the purpose of creating a new process is to run a new (i.e., different) program. Once a new process has been created, it can use the exec system call to load a new program image into itself, replacing the prior contents of the process’s address space. Exec is passed the name of a file containing a fully relocated program image (which might require further linking via a runtime linker). The previous text region of the process is replaced with the text of the program image. The data, BSS and dynamic areas of the process are “thrown away” and replaced with the data and BSS of the program image. The contents of the process’s stack are replaced with the arguments that are passed to the main procedure of the program.

II–8


Fork/Exec Example

if (fork( ) == 0) {// child process–– set up I/O in child ––execv(newprogram, parameters);// load new image// if we get here, there’s a problem

}// parent process continues here

The slide shows a typical example of the use of the fork and exec system calls. The parent process calls fork and two threads of control, one in the parent process and one in the child process, return. If a thread sees that fork returns 0, then it must be in the child process. What typically happens next is that this thread sets up the I/O descriptors of the new process (which we discuss shortly) and then execs a new program. Here we use the execvsystem call, which passes the arguments to the system in a vector (hence the v). If execvsucceeds, it does not return—there is nothing to return to! The process contains an entirely new image. Thus if execv does return, there must have been an error (for example, the file did not exist).

If fork returns a positive value, then this value is the process ID of the child process and the thread executing this code must be in the parent process (i.e., it is the original thread that called fork in the first place). It might continue on and do something totally independent of the child process, or it might at some point wait until the child terminates.

II–9


Environment

• Set of pairs: <name, value>– e.g. <term, xterm>, <TZ, US/Eastern>– maintained in user-mode address space

- Accessed using library routines• getenv•putenv

• Initially established via exec:– execve(char *file, char *argv[ ], char *envp[ ])

- supplied in third argument– execv(char *file, char *argv[ ])

- Inherited from caller

A process’s environment is a set of names and values that’s maintained in its user-mode address space. It’s established when the process image is set up via exec. As shown in the slide, there’s a variant of exec, execve, with which one can supply the environment explicitly. If execv is used, the environment is passed to the new image directly from the old.

II–10


System Calls

• Sole interface between user and kernel• Implemented as library routines that execute

trap instructions to enter kernel• Errors indicated by return of –1; error code is

in errno

if (write(fd, buffer, bufsize) == –1) {// error!printf("error %d\n", errno);// see perror

}

System calls, such as fork, execv, read, write, etc., are the only means for application programs to communicate directly with the kernel: they form an API (application program interface) to the kernel. When a program calls such a routine, it is actually placing a call to a subroutine in a system library. The body of this subroutine contains a hardware-specific trap instruction which transfers control and some parameters to the kernel. On return to this library return, the kernel provides an indication of whether or not there was an error and what the error was. The error indication is passed back to the original caller via the functional return value of the library routine. If there was an error, a positive-integer code identifying it is stored in the global variable errno. Rather than simply print this code out, as shown in the slide, one might instead print out an informative error message. This can be done via the perror routine.

II–11


System Calls

write(fd, buf, len)

kernel text

other stuffkernel stack

trap into kernel

II–12






Multiple Processes

kernel text

II–13


The File Abstraction

• A file is a simple array of bytes• Files are made larger by writing beyond their

current end• Files are named by paths in a naming tree• System calls on files are synchronous

As discussed three pages ago, most programs perform file I/O using library code layered on top of kernel code. In this section we discuss just the kernel aspects of file I/O, looking at the abstraction and the high-level aspects of how this abstraction is implemented.

The Unix file abstraction is very simple: files are simply arrays of bytes. Many systems have special system calls to make a file larger. In Unix, you simply write where you’ve never written before, and the file “magically” grows to the new size (within limits). The names of files are equally straightforward—just the names labeling the path that leads to the file within the directory tree. Finally, from the programmer’s point of view, all operations on files appear to be synchronous—when an I/O system call returns, as far as the process is concerned, the I/O has completed. (Things are different from the kernel’s point of view, as discussed later.)

II–14


Directories

unix etc home pro dev

twdpasswd motd

unix ...

slide1 slide2

Here is a portion of a Unix directory tree. The ovals represent files, the rectangles represent directories (which are really just special cases of files).

II–15


Directory Representation

Component Name Inode Number

unix 117etc 4

home 18pro 36dev 93

directory entry

. 1.. 1

A directory consists of an array of pairs of component name and inode number, where the latter identifies the target file’s inode to the operating system (an inode is data structure maintained by the operating system that represents a file). Note that every directory contains two special entries, “.” and “..”. The former refers to the directory itself, the latter to the directory’s parent (in the case of the slide, the directory is the root directory and has no parent, thus its “..” entry is a special case that refers to the directory itself).

II–16


Hard Links


twd

image motdunix ...

slide1 slide2

% ln /unix /etc/image# link system call

Here are two directory entries referring to the same file. This is done, via the shell, through the ln command which creates a (hard) link to its first argument, giving it the name specified by its second argument.

The shell’s “ln” command is implemented using the link system call.

II–17


Directory Representation

unix 117etc 4

home 18pro 36dev 93

. 4.. 1

image 117motd 33

. 1.. 1

Here are the (abbreviated) contents of both the root (/) and /etc directories, showing how /unix and /etc/image are the same file. Note that if the directory entry /unix is deleted (via the shell’s “rm” command), the file (represented by inode 117) continues to exist, since there is still a directory entry referring to it. However if /etc/image is also deleted, then the file has no more links and is removed. To implement this, the file’s inode contains a link count, indicating the total number of directory entries that refer to it. A file is actually deleted only when its inode’s link count reaches zero.

Note: suppose a file is open, i.e. is being used by some process, when its link count becomes zero. Rather than delete the file while the process is using it, the file will continue to exist until no process has it open. Thus the inode also contains a reference count indicating how many times it is open: in particular, how many system file table entries point to it. A file is deleted when and only when both the link count and this reference count become zero.

The shell’s “rm” command is implemented using the unlink system call.Note that /etc/.. refers to the root directory.

II–18


Soft Links


twd

image twdunix ...

slide1 slide2

% ln –s /unix /home/twd/mylink% ln –s /home/twd /etc/twd# symlink system call

mylink

/unix/home/twd

Differing from a hard link, a soft link (or symbolic link) is a special kind of file containing the name of another file. When the kernel processes such a file, rather than simply retrieving its contents, it makes use of the contents by replacing the portion of the directory path that it has already followed with the contents of the soft-link file and then following the resulting path. Thus referencing /home/twd/mylink results in the same file as referencing /unix. Referencing /etc/twd/unix/slide1 results in the same file as referencing /home/twd/unix/slide1.

The shell’s “ln” command with the “-s” flag is implemented using the symlink system call.

II–19


Working Directory

• Maintained in kernel (as an inode number) for each process

– paths not starting from “/” start with the working directory

– changed by use of the chdir system call– displayed (via shell) using “pwd”

- how is this done?

The working directory is maintained (as the inode number of the directory) in the kernel for each process. Whenever a process attempts to follow a path that doesn’t start with “/”, it starts at its working directory (rather than at “/”).

II–20


Standard File Descriptors

main( ) {char buf[BUFSIZE];int n;const char* note = "Write failed\n";

while ((n = read(0, buf, sizeof(buf))) > 0)if (write(1, buf, n) != n) {

(void)write(2, note, strlen(note));exit(EXIT_FAILURE);

}return(EXIT_SUCCESS);

}

The file descriptors 0, 1, and 2 are opened to access your terminal when you log in, and are preserved across forks, unless redirected.

II–21


Representing an Open File (1)

0123

.

.

.

n–1

file descriptortable

system filetable

ref countaccess

f pointerinode

active inodetable

buffer cachedisk

A process’s set of open files is represented by a data structure known as the file descriptor table, which is used to map file descriptors, representing open files, to system file table entries, each representing an open file. Each process has a separate file descriptor table, but there is exactly one system file table for the entire system. Each active file (i.e., a file that is open or otherwise being used) is represented by an inode that is entered in the active inode table. Recently accessed blocks of data from the file are kept in the kernel’s data section in what is known as the buffer cache—an area of memory reserved for buffering data from files. Thus data from these blocks can be accessed again and again without wasting time going to the disk. When a disk block is modified, the modifications are written in the buffer cache and only later are written to disk. This allows the process doing the modification to proceed without having to wait for the data to be written to disk.

II–22



0123

.

.

.

n–1


1 rw 0

system filetable

ref countaccess

f pointerinode

active inodetable

buffer cachedisk

fdrw

fdrw = open("x", O_RDWR);

This slide shows what happens when a file is opened.The lowest-numbered available file descriptor is allocated from the file descriptor table.

Next, an entry in the system file table is allocated, and the file descriptor table entry is set to point to the system file table entry. An inode-table entry for the file is allocated (or found if it already exists) and the system file table entry is set to point to it. Additional fields of the system file table entry are initialized, including:

• a reference count• each file-descriptor-table entry pointing to it contribute one to the reference

count• the entry is freed when the reference count goes to zero

• the allowed access (i.e., how the file was opened—read-only, read-write, etc.)• the file pointer (i.e., the location within the file at which the next transfer will start)

• after every read and write system call the file pointer is incremented by the number of bytes actually transferred, thus facilitating sequential I/O

II–23


Global View

other stuff

kernel stack

other stuff

kernel stack

other stuff

kernel stack

other stuff

kernel stack

kernel text

II–24


Allocation of File Descriptors

• Whenever a process requests a new file descriptor, the lowest numbered file descriptor not already associated with an open file is selected; thus

#include <fcntl.h>#include <unistd.h>

close(0);fd = open("file", O_RDONLY);

– will always associate file with file descriptor 0 (assuming that the open succeeds)

One can depend on always getting the lowest available file descriptor.

II–25


Reading From a File

#include <sys/types.h>#include <unistd.h>ssize_t read(int fd, void *buffer, size_t n)

– read up to n bytes of data into buffer– returns number of bytes transferred– returns 0 on end of file– returns –1 on error

The read system call tells the system to read up to n bytes of data into the area of memory pointed to by buffer from the file referred to by fd. It returns the number of bytes actually transferred, or, as usual, –1 if there was an error.

Read will transfer fewer bytes than specified if:• the number of bytes left in the file was less than n• the read request was interrupted by a signal after some of the bytes were

transferred• the file is a pipe, FIFO, or special device with less than n bytes immediately

available for readingIt transfers as many bytes as are present up to the given maximum; if it returns a zero,

that means the end of the file has been reached.

II–26


Writing To a File

#include <sys/types.h>#include <unistd.h>ssize_t write(int fd, void *buffer, size_t n)

– write up to n bytes of data from buffer– returns number of bytes transferred– returns –1 on error

The write system call tells the system to write up to n bytes of data from the area of memory pointed to by buffer into the file referred to by fd. It returns the number of bytes actually transferred, or, as usual, –1 if there was an error.

If write transfers fewer bytes than specified, then something caused the transfer to stop prematurely:

• a signal interrupted the system call• this is pretty complicated and we’ll discuss it in more detail later. However,

this would happen if the requested size was large enough so that the write was split by the kernel into a number of segments, at least one segment was written (otherwise the call would have returned with an error and errno set to EINTR), and the signal occurred while a subsequent segment was being written

• a filesize limit was reached• an I/O error occurred

The response in all these cases should be to attempt to rewrite those bytes that were not transferred; if a signal had interrupted the previous try, then the next try will succeed (unless again interrupted by a signal); if a filesize limit had been reached or an I/O error has occurred, this next write will yield the appropriate error code.

II–27


Example

main( ) {char buf[BUFSIZE]; int nread;const char* note = "Write failed\n";while ((nread = read(0, buf, sizeof(buf))) > 0) {

int bytes_left = nread; int bpos = 0;while ((n = write(1, &buf[bpos], bytes_left)) < bytes_left) {

if (n == –1) {write(2, note, strlen(note));exit(EXIT_FAILURE);

}bytes_left −= n; bpos += n;

}}return(EXIT_SUCCESS);

}

Here we copy from the program’s standard input to its standard output, but take advantage of what we’ve learned about the behavior of write. In particular, since it’s not guaranteed that write will transfer all the bytes requested, we must supply code to make sure that all the data does get transferred. Thus, after each call to write (except, for reasons of space on the slide, when we write to file descriptor 2), we check to see how much data was written and call write again, if necessary, to handle the unwritten data.

II–28


Random Access

#include <sys/types.h>#include <unistd.h>off_t lseek(int fd, off_t offset, int whence)

– sets the file pointer for fd: if whence is SEEK_SET, the pointer is set to

offset bytes; if whence is SEEK_CUR, the pointer is set to its

current value plus offset bytes; if whence is SEEK_END, the pointer is set to the

size of the file plus offset bytes– it returns the (possibly) updated value of the file

pointer relative to the beginning of the file. Thus, n = lseek(fd, (off_t)0, SEEK_CUR);

returns the current value of the file pointer for fd

To effect random access to files (i.e., access files other than sequentially), you first set the file pointer, then perform a read or write. Setting the file pointer is done with the lseeksystem call.

II–29


lseek Example

• Example: printing a text file backwards:fd = open("textfile", O_RDONLY); /* go to last char in file */fptr = lseek(fd, (off_t)–1, SEEK_END);while (fptr != –1) {

read(fd, buf, 1);write(1, buf, 1); fptr = lseek(fd, (off_t)–2, SEEK_CUR);

}

This example prints the contents of a file backwards. Note what’s happening in the whilestatement: it continues as long as fptr does not return –1, a value it will return if the call to lseek fails. Our intent is that the last successful call to lseek sets the file pointer to 0. The next call, which attempts to set it to –1, will fail, thus causing lseek to return –1 (recall that failed system calls always return –1).

II–30



0123

.

.

.

n–1


1 r 101 rw 20

system filetable

ref countaccess

f pointerinode

active inodetable

buffer cachedisk

fdrw

fdrw = open("x", O_RDWR);fdr = open("x", O_RDONLY);write(fdrw, buf, 20);read(fdr, buf2, 10);

fdr

In this slide we see the effect of two opens of the same file within the same process.

II–31


Multiple Descriptors; One File

• How are standard file descriptors set up?– suppose 1 and 2 are opened separately

while ((n = read(0, buf, sizeof(buf))) > 0)if (write(1, buf, n) != n) {

(void)write(2, note, strlen(note));exit(EXIT_FAILURE);

}

– error message clobbers data bytes!

By convention, file descriptors 1 and 2 are used for processes’ normal and diagnostic output. Normally they both refer to the display, and thus diagnostic output is intermingled with normal output. Suppose, however, one wanted to redirect both file descriptors so that all output, normal and diagnostic, was sent to a file. One might open this file twice, once as file descriptor 1 and again as file descriptor 2, thereby creating two system file table entries. As file descriptor 1 receives output, the offset field of its file table entry advances with each write. After 1000 bytes have been written (sequentially), the offset field is set to 1000, representing the current end-of-file.

If at this point a diagnostic message is written to file descriptor 2, it will start at the beginning of the file, overwriting the data already there, since file descriptor 2’s file table entry’s offset is still at 0. This outcome is certainly not desirable.

II–32


dup System Calls

• dup returns a new file descriptor referring to the same file as its argument

int dup(int fd)

• dup2 is similar, but it allows you to specify the new file descriptor

int dup2(int oldfd, int newfd)

We can use one of the dup system calls to solve this problem. dup obeys our rule of always allocating the lowest available file descriptor. However, with dup2, one can specify, via the second argument, which file descriptor is allocated. If the second argument is the file descriptor of an open file, the file is first closed, then associated with the file of the first argument.

II–33


dup Example

/* redirect stdout and stderr to same file *//* assumes file descriptor 0 is in use */close(1);open("file", O_WRONLY|O_CREAT, 0666);close(2);dup(1);

/* alternatively, replace last two lines with: */dup2(1, 2);

Here we see how to use dup and dup2 to set file descriptors 1 and 2 to refer to the same system file-table entry. Note the extra argument to open. We’ve given open the O_CREAT flag, which tells the system that if the file does not exist, it should create it. The third argument helps to specify the access permissions assigned to the file if it’s created by this call. We discuss this in detail in a few slides.

II–34



0123

.

.

.

n–1


2 rw 20

system filetable

ref countaccess

f pointerinode

active inodetable

buffer cachedisk

fdrw

fdrw = open("x", O_RDWR);fdrw2 = dup(fdrw);write(fdrw, buf, 20);

fdrw2

The dup system call causes two file descriptors to refer to the same file table entry and hence share the offset.

II–35



0123

.

.

.

n–1


2 r 104 rw 20

system filetable

ref countaccess

f pointerinode

active inodetable

buffer cachedisk

fdrw

fork( )

fdrw2

fdr

If our process executes a fork system call, creating a child process, the child is given a file-descriptor table that’s a copy of its parent’s. Of course, the reference counts on the system-file-table entries are increased appropriately.

II–36


I/O Redirection

% who > file &

if (fork( ) == 0) {char *args[ ] = {"who", 0};close(1);open("file", O_WRONLY|O_TRUNC, 0666);execv("who", args);printf("you screwed up\n");exit(1);

}

This is an example of what a shell might do to handle I/O redirection: it first creates a new process in which to run a command (“who”, in this case). In the new process it closes file descriptor 1 (standard output—to which normal output is written). It then opens “file” (the arguments indicate that “file” is only to be written to, that any prior contents are erased, and that if “file” didn’t already exist, it will be created with read and write permission for all; assuming that file descriptor 0 is not available (it’s assigned to standard input), file descriptor 1 will be assigned to “file”. Assuming that execv succeeds, when “who” runs, its output is written to “file”.

Note that the parent process does not wait for its child to terminate; it goes on to execute further commands. (This behavior occurs because we’ve placed an “&” at the end of the command line.)

Note the args argument to execv: By convention, the first argument to each command is the name of the command (“who” in this case). To indicate that there are no further arguments, a zero is supplied.

Note that we aren’t checking for errors: this is only because doing so would cause the resulting code not to fit in the slide. You should always check for errors.

II–37


More I/O Redirection

% who >& file &

if (fork( ) == 0) {char *args[ ] = {"who", 0};close(1);close(2);open("file", O_WRONLY|O_TRUNC|O_CREAT, 0666);dup(1);execv("who", args);…

}

In this example (using csh syntax), we run “who” with both standard output (file descriptor 1) and standard error (file descriptor 2) redirected to “file”. It’s done in such a way that writes to either standard output or standard error go the current end of the file.

II–38


Waiting For Termination

% who

if ((pid = fork( )) == 0) {. . .execv("who", args);. . .

}while (pid != wait(0))

;

Here we execute a shell command and wait for it to terminate before going on to the next command (i.e., what’s normally done). The wait system call causes the caller to wait (i.e., not execute) until one of its children terminate and then returns the process ID of the process that’s terminated (if its argument is nonzero, it points to an area of memory at which will be written status information about the child, such as the argument it passed to exit). Here our first process waits until the process it created to run “who” terminates.

II–39


Open

#include <sys/types.h>#include <sys/stat.h>#include <fcntl.h>int open(const char *path, int options [, mode_t mode])

– options- O_RDONLY open for reading only- O_WRONLY open for writing only- O_RDWR open for reading and writing- O_APPEND set the file offset to end of file prior to each

write- O_CREAT if the file does not exist, then create it,

setting its mode to mode adjusted by umask- O_EXCL if O_EXCL and O_CREAT are set, then

open fails if the file exists- O_TRUNC delete any previous contents of the file- O_NONBLOCK don’t wait if I/O can’t be done immediately

Here’s a partial list of the options available as the second argument to open. (Further options are often available, but they depend on the version of Unix.) Note that the first three options are mutually exclusive: one, and only one, must be supplied. We discuss the third argument to open, mode, shortly.

II–40


File Access Permissions

• Who’s allowed to do what?– who

- user (owner)- group- others (rest of the world)

– what- read- write- execute

Each file has associated with it a set of access permissions indicating, for each of three classes of principals, what sorts of operations on the file are allowed. The three classes are the owner of the file, known as user, the group owner of the file, known simply as group, and everyone else, known as others. The operations are grouped into the classes read, write, and execute, with their obvious meanings. The access permissions apply to directories as well as to ordinary files, though the meaning of execute for directories is not quite so obvious: one must have execute permission for a directory file in order to follow a path through it.

The system, when checking permissions, first determines the smallest class of principals the requester belongs to: user (smallest), group, or others (largest). It then, within the chosen class, checks for appropriate permissions.

II–41


Permissions Example

% ls -lR.:total 2drwxr-x--x 2 tom adm 1024 Dec 17 13:34 Adrwxr----- 2 tom adm 1024 Dec 17 13:34 B

./A:total 1-rw-rw-rw- 1 tom adm 593 Dec 17 13:34 x

./B:total 2-r--rw-rw- 1 tom adm 446 Dec 17 13:34 x-rw----rw- 1 trina adm 446 Dec 17 13:45 y

In the current directory are two subdirectories, A and B, with access permissions as shown in the slide. Note that the permissions are given as a string of characters: the first character indicates whether or not the file is a directory, the next three characters are the permissions for the owner of the file, the next three are the permissions for the members of the file’s group’s members, and the last three are the permissions for the rest of the world.

Quiz: the users tom and trina are members of the adm group; andy is not.May andy list the contents of directory A?May andy read A/x?May trina list the contents of directory B?May trina modify B/y?May tom modify B/x?May tom read B/y?

II–42


Setting File Permissions

#include <sys/types.h>#include <sys/stat.h>int chmod(const char *path, mode_t mode)

– sets the file permissions of the given file to those specified in mode

– only the owner of a file and the superuser may change its permissions

– nine combinable possibilities for mode(read/write/execute for user, group, and others)

- S_IRUSR (0400), S_IWUSR (0200), S_IXUSR (0100)- S_IRGRP (040), S_IWGRP (020), S_IXGRP (010)- S_IROTH (04), S_IWOTH (02), S_IXOTH (01)

The chmod system call (and the similar chmod shell command) are used to change the permissions of a file. Note that the symbolic names for the permissions are rather cumbersome; what is often done is to use their numerical equivalents instead. Thus the combination of read/write/execute permission for the user (0700), read/execute permission for the group (050), and execute-only permission for others (01) can be specified simply as 0751.

II–43


Creating a File

• Use either open or creat– open (const char *pathname, int flags, mode_t mode)

- flags must include O_CREAT– creat(const char *pathname, mode_t mode)

- open is preferred• The mode parameter helps specify the permissions of

the newly created file– permissions = mode & ~umask

Originally in Unix one created a file only by using the creat system call. A separate O_CREAT flag was later given to open so that it, too, can be used to create files. The creatsystem call fails if the file already exists. For open, what happens if the file already exists depends upon the use of the flags O_EXCL and O_TRUNC. If O_EXCL is included with the flags (e.g., open(“newfile”, O_CREAT|O_EXCL, 0777)), then, as with creat, the call fails if the file exists. Otherwise, the call succeeds and the (existing) file is opened. If O_TRUNC is included in the flags, then, if the file exists, its previous contents are eliminated and the file (whose size is now zero) is opened.

When a file is created by either open or creat, the file’s initial access permissions are the bitwise AND of the mode parameter and the complement of the process’s umask (explained in the next slide).

II–44


Umask

• Standard programs create files with “maximum needed permissions” as mode

– compilers: 0777– editors: 0666

• Per-process parameter, umask, used to turn off undesired permission bits

– e.g., turn off all permissions for others, write permission for group: set umask to 027

- compilers: permissions = 0777 & ~(027) = 0750

- editors: permissions = 0666 & ~(027) = 0640– set with umask system call or (usually) shell

command

The umask (often called the “creation mask”) allows programs to have wired into them a standard set of maximum needed permissions as their file-creation modes. Users then have, as part of their environment (via a per-process parameter that is inherited by child processes from their parents), a limit on the permissions given to each of the classes of security principals. This limit (the umask) looks like the 9-bit permissions vector associated with each file, but each one-bit indicates that the corresponding permission is not to be granted. Thus, if umask is set to 022, then, whenever a file is created, regardless of the settings of the mode bits in the open or creat call, read permission for group and others is not to be included with the file’s access permissions.

You can determine the current setting of umask by executing the umask shell command without any arguments.

02unixintro

Technology

new process

word process

separate process

process limits

standard unix process

process id pid

new program image

fork system