SPSA

System programming:

System programming (or systems programming) is the activity of programming system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to produce software which provides services to the user (e.g. word processor), whereas systems programming aims to produce software which provides services to the computer hardware (e.g. disk defragmenter). It also requires a greater degree of hardware awareness.Contents [hide]In system programming more specifically:

the programmer will make assumptions about the hardware and other properties of the system that the program runs on, and will often exploit those properties (for example by using an algorithm that is known to be efficient when used with specific hardware)usually a low-level programming language or programming language dialect is used that: can operate in resource-constrained environments is very efficient and has little runtime overhead has a small runtime library, or none at all allows for direct and "raw" control over memory access and control flow lets the programmer write parts of the program directly in assembly language debugging can be difficult if it is not possible to run the program in a debugger due to resource constraints. Running the program in a simulated environment can be used to reduce this problem.Systems programming is sufficiently different from application programming that programmers tend to specialize in one or the other. In system programming, often limited programming facilities are available. The use of automatic garbage collection is not common and debugging is sometimes hard to do. The runtime library, if available at all, is usually far less powerful, and does less error checking. Because of those limitations, monitoring and logging are often used; operating systems may have extremely elaborate logging subsystems. Implementing certain parts in operating system and networking requires systems programming (for example implementing Paging (Virtual Memory) or a device driver for an operating system).

Originally systems programmers invariably wrote in assembly language. Experiments with hardware support in high-level languages in the late 1960s led to such languages as BLISS, BCPL, and extended Algol for Burroughs large systems, but C, helped by the growth of UNIX, became ubiquitous in the 1980s. More recently Embedded C++ has seen some use, for instance in the I/O Kit drivers of Mac OS X. For historical reasons, some organizations use the term systems programmer to describe a job function which would be more accurately termed systems administrator. This is particularly true in organizations whose computer resources have historically been dominated by mainframes, although the term is even used to describe job functions which do not involve mainframes. This usage arose because administration of IBM mainframes often involved the writing of custom assembler code which integrated with the Operating System, indeed, some IBM software products had substantial code contributions from customer programming staff. This type of programming is progressively less common, but the term systems programmer is still the defacto job title for staff directly administering IBM mainframes.

Assemblers:

Assemblers involve a set of concepts that enable code to be written in a language and then used to control the computer's operations. These concepts include the assignment of labels to represent locations in memory (e.g. letters such as X, Y, Z). Fixed names are given to operations such as STORE, LOAD and ADD as well as registers (e.g. R1, R2). Numbers written in decimal are also converted to binary. Each line in assembly language translates to a single machine code instruction. Assembly languages are machine specific and require an understanding of the structure of the machine in order for that machine to be used to its potential. Assembly language is also very hard to learn.

FORTRAN was an early language that appeared around the same time as COBOL and it was very similar to assembly code. COBOL came along in about 1956 and it combined assembly code instructions into an easier to write language. Some other languages: BASIC (Beginner's All-purpose Symbolic Instruction Code) - a simple language originally designed for beginners and students .

PASCAL (designed early 1970s) - originally designed for simplicity in order to teach programming

C++ (1986) - commonly used object-oriented language

VB (1991) - visual programming system designed by Microsoft

Programming languages involve declarations of variables, literals and constants. Variables in a programming language are simply labels pointing to a location or they represent a type of data. Constants are labels for literals which are bit patterns. There are four main types of imperative instructions within programming languages. These are Assignments, Expressions, Control Statements, and Procedural Units (which are also known as methods).

An assembler is a program that takes basic computer instructions and converts them into a pattern of bits that the computer's processor can use to perform its basic operations. Some people call these instructions assembler language and others use the term assembly language.Here's how it works:

Most computers come with a specified set of very basic instructions that correspond to the basic machine operations that the computer can perform. For example, a "Load" instruction causes the processor to move a string of bits from a location in the processor's memory to a special holding place called a register. Assuming the processor has at least eight registers, each numbered, the following instruction would move the value (string of bits of a certain length) at memory location 3000 into the holding place called register 8: L 8,3000

The programmer can write a program using a sequence of these assembler instructions. This sequence of assembler instructions, known as the source code or source program, is

then specified to the assembler program when that program is started.

The assembler program takes each program statement in the source program and generates a corresponding bit stream or pattern (a series of 0's and 1's of a given length).

The output of the assembler program is called the object code or object program relative to the input source program. The sequence of 0's and 1's that constitute the object program is sometimes called machine code.

The object program can then be run (or executed) whenever desired.

In the earliest computers, programmers actually wrote programs in machine code, but assembler languages or instruction sets were soon developed to speed up programming. Today, assembler programming is used only where very efficient control over processor operations is needed. It requires knowledge of a particular computer's instruction set, however. Historically, most programs have been written in "higher-level" languages such as COBOL, FORTRAN, PL/I, and C. These languages are easier to learn and faster to write programs with than assembler language. The program that processes the source code written in these languages is called a compiler. Like the assembler, a compiler takes higher-level language statements and reduces them to machine code.

A newer idea in program preparation and portability is the concept of a virtual machine. For example, using the Java programming language, language statements are compiled into a generic form of machine language known as bytecode that can be run by a virtual machine, a kind of theoretical machine that approximates most computer operations. The bytecode can then be sent to any computer platform that has previously downloaded or built in the Java virtual machine. The virtual machine is aware of the specific instruction lengths and other particularities of the platform and ensures that the Java bytecode can run.

What do linkers and loaders do?The basic job of any linker or loader is simple: it binds more abstract names to more concrete names, which permits programmers to write code using the more abstract names. That is, it takes a name written by a programmer such as getline and binds it to ``the location 612 bytes from the beginning of the executable code in module iosys.'' Or it may take a more abstract numeric address such as ``the location 450 bytes beyond the beginning of the static data for this module'' and bind it to a numeric address.

Linking vs. loadingLinkers and loaders perform several related but conceptually separate actions.

Program loading: Copy a program from secondary storage (which since about 1968 invariably means a disk) into main memory so it's ready to run. In some cases loading just involves copying the data from disk to memory, in others it involves allocating storage, setting protection bits, or arranging for virtual memory to map virtual addresses to disk pages.

Relocation: Compilers and assemblers generally create each file of object code with the program addresses starting at zero, but few computers let you load your program at location zero. If a program is created from multiple subprograms, all the subprograms have to be loaded at non-overlapping addresses. Relocation is the process of assigning load addresses to

the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. In many systems, relocation happens more than once. It's quite common for a linker to create a program from multiple subprograms, and create one linked output program that starts at zero, with the various subprograms relocated to locations within the big program. Then when the program is loaded, the system picks the actual load address and the linked program is relocated as a whole to the load address.

Symbol resolution: When a program is built from multiple subprograms, the references from one subprogram to another are made using symbols; a main program might use a square root routine called sqrt, and the math library defines sqrt. A linker resolves the symbol by noting the location assigned to sqrt in the library, and patching the caller's object code to so the call instruction refers to that location.

Although there's considerable overlap between linking and loading, it's reasonable to define a program that does program loading as a loader, and one that does symbol resolution as a linker. Either can do relocation, and there have been all-in-one linking loaders that do all three functions. The line between relocation and symbol resolution can be fuzzy. Since linkers already can resolve references to symbols, one way to handle code relocation is to assign a symbol to the base address of each part of the program, and treat relocatable addresses as references to the base address symbols. One important feature that linkers and loaders share is that they both patch object code, the only widely used programs to do so other than perhaps debuggers. This is a uniquely powerful feature, albeit one that is extremely machine specific in the details, and can lead to baffling bugs if done wrong.

Two-pass linkingNow we turn to the general structure of linkers. Linking, like compiling or assembling, is fundamentally a two pass process. A linker takes as its input a set of input object files, libraries, and perhaps command files, and produces as its result an output object file, and perhaps ancillary information such as a load map or a file containing debugger symbols. Each input file contains a set of segments, contiguous chunks of code or data to be placed in the output file. Each input file also contains at least one symbol table. Some symbols are exported, defined within the file for use in other files, generally the names of routines within the file that can be called from elsewhere. Other symbols are imported, used in the file but not defined, generally the names of routines called from but not present in the file. When a linker runs, it first has to scan the input files to find the sizes of the segments and to collect the definitions and references of all of the symbols It creates a segment table listing all of the segments defined in the input files, and a symbol table with all of the symbols imported or exported. Using the data from the first pass, the linker assigns numeric locations to symbols, determines the sizes and location of the segments in the output address space, and figures out where everything goes in the output file. The second pass uses the information collected in the first pass to control the actual linking process. It reads and relocates the object code, substituting numeric addresses for symbol references, and adjusting memory addresses in code and data to reflect relocated segment addresses, and writes the relocated code to the output file. It then writes the output file, generally with header information, the relocated segments, and symbol table information. If the program uses dynamic linking, the symbol table contains the info the runtime linker will need to resolve dynamic symbols. In many cases, the linker itself will generate small amounts of code or data in the output file, such as "glue code" used to call routines in overlays or dynamically

linked libraries, or an array of pointers to initialization routines that need to be called at program startup time. Whether or not the program uses dynamic linking, the file may also contain a symbol table for relinking or debugging that isn't used by the program itself, but may be used by other programs that deal with the output file.

Some object formats are relinkable, that is, the output file from one linker run can be used as the input to a subsequent linker run. This requires that the output file contain a symbol table like one in an input file, as well as all of the other auxiliary information present in an input file.

Nearly all object formats have provision for debugging symbols, so that when the program is run under the control of a debugger, the debugger can use those symbols to let the programmer control the program in terms of the line numbers and names used in the source program. Depending on the details of the object format, the debugging symbols may be intermixed in a single symbol table with symbols needed by the linker, or there may be one table for the linker and a separate, somewhat redundant table for the debugger.

A few linkers appear to work in one pass. They do that by buffering some or all of the contents of the input file in memory or disk during the linking process, then reading the buffered material later. Since this is an implementation trick that doesn't fundamentally affect the two-pass nature of linking, we don't address it further here.

Object code librariesAll linkers support object code libraries in one form or another, with most also providing support for various kinds of shared libraries.

The basic principle of object code libraries is simple enough, Figure 2. A library is little more than a set of object code files. (Indeed, on some systems you can literally catenate a bunch of object files together and use the result as a link library.) After the linker processes all of the regular input files, if any imported names remain undefined, it runs through the library or libraries and links in any of the files in the library that export one or more undefined names. Shared libraries complicate this task a little by moving some of the work from link time to load time. The linker identifies the shared libraries that resolve the undefined names in a linker run, but rather than linking anything into the program, the linker notes in the output file the names of the libraries in which the symbols were found, so that the shared library can be bound in when the program is loaded. See Chapters 9 and 10 for the details.

Relocation and code modification

The heart of a linker or loader's actions is relocation and code modification. When a compiler or assembler generates and object file, it generates the code using the unrelocated addresses of code and data defined within the file, and usually zeros for code and data defined elsewhere. As part of the linking process, the linker modifies the object code to reflect the actual addresses assigned. For example, consider this snippet of x86 code that moves the contents of variable a to variable b using the eax register.

A macro (from the Greek 'μάκρο' for big or far) in computer science is a rule or pattern that specifies how a certain input sequence (often a sequence of characters) should be mapped to an output sequence (also often a sequence of characters) according to a defined procedure. The mapping process that instantiates a macro into a specific output sequence is known as macro expansion.The term originated with macro-assemblers, where the idea is to make available to the programmer a sequence of computing instructions as a single program statement, making the programming task less tedious and less error-prone

Keyboard macros and mouse macros allow short sequences of keystrokes and mouse actions to be transformed into other, usually more time-consuming, sequences of keystrokes and mouse actions. In this way, frequently-used or repetitive sequences of keystrokes and mouse movements can be automated. Separate programs for creating these macros are called macro recorders.

During the 1980s, macro programs -- originally SmartKey, then SuperKey, KeyWorks, Prokey -- were very popular, first as a means to automatically format screenplays, then for a variety of user input tasks. These programs were based on the TSR (Terminate and stay resident) mode of operation and applied to all keyboard input, no matter in which context it occurred. They have to some extent fallen into obsolescence following the advent of mouse-driven user interface and the availability of keyboard and mouse macros in applications, such as word processors and spreadsheets, which makes it possible to create application-sensitive keyboard macros.

Keyboard macros have in more recent times come to life as a method of exploiting the economy of massively multiplayer online role-playing game (MMORPG)s. By tirelessly performing a boring, repetitive, but low risk action, a player running a macro can earn a large amount of the game's currency. This effect is even larger when a macro-using player operates multiple accounts simultaneously, or operates the accounts for a large amount of time each day. As this money is generated without human intervention, it can dramatically upset the economy of the game by causing runaway inflation. For this reason, use of macros is a violation of the TOS or EULA of most MMORPGs, and administrators of MMORPGs fight a continual war to identify and punish macro users.

Compilers:

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.

http://en.wikipedia.org/wiki/Rewriting

http://en.wikipedia.org/wiki/Translator_(computing)

http://en.wikipedia.org/wiki/Decompiler

http://en.wikipedia.org/wiki/Machine_code

http://en.wikipedia.org/wiki/Assembly_language

http://en.wikipedia.org/wiki/High-level_programming_language

http://en.wikipedia.org/wiki/High-level_programming_language

http://en.wikipedia.org/wiki/Executable

http://en.wikipedia.org/wiki/Object_code

http://en.wikipedia.org/wiki/Programming_language

http://en.wikipedia.org/wiki/Source_code

http://en.wikipedia.org/wiki/Computer_program

http://en.wikipedia.org/wiki/EULA

http://en.wikipedia.org/wiki/Terms_of_Service

http://en.wikipedia.org/wiki/Inflation

http://en.wikipedia.org/wiki/MMORPG

http://en.wikipedia.org/wiki/Screenplay

http://en.wikipedia.org/w/index.php?title=SuperKey&action=edit&redlink=1

http://en.wikipedia.org/wiki/SmartKey

http://en.wikipedia.org/wiki/Macro_recorder

http://en.wikipedia.org/wiki/Automate

http://en.wikipedia.org/wiki/Sequence

http://en.wikipedia.org/wiki/Repetitive

http://en.wikipedia.org/wiki/Computer_programming

http://en.wikipedia.org/wiki/Assembly_language

http://en.wikipedia.org/wiki/Character_(computing)

http://en.wikipedia.org/wiki/Pattern

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Greek_language

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software.

The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser.

http://en.wikipedia.org/wiki/Parser

http://en.wikipedia.org/wiki/Lexical_analysis

http://en.wikipedia.org/wiki/Parser_generator

http://en.wikipedia.org/wiki/Compiler-compiler

http://en.wikipedia.org/wiki/Compiler_correctness

http://en.wikipedia.org/wiki/Compiler_correctness

http://en.wikipedia.org/wiki/Code_optimization

http://en.wikipedia.org/wiki/Code_generation_(compiler)

http://en.wikipedia.org/wiki/Parsing

http://en.wikipedia.org/wiki/Preprocessing

http://en.wikipedia.org/wiki/Lexical_analysis

Incremental Compiler

A computer-aided software development system includes programs to implement edit, compile, link and run sequences, all from memory, at very high speed. The compiler operates on an incremental basis, line-by-line, so if only one line is changed in an edit session, then only that line need be recompiled if no other code is affected. Scanning is done incrementally, and the resulting token list saved in memory to be used again where no changes are made. All of the linking tables are saved in memory so there is no need to generate link tables for increments of code where no changes in links are needed.

The parser is able to skip lines or blocks of lines of source code which haven't been changed; for this purpose, each line of source text in the editor has a change-tag to indicate whether this line has been changed, and from this change-tag information a clean-lines table is built having a clean-lines indication for each line of source code, indicating how many clean lines follow the present line. All of the source code text modules, the token lists, symbol tables, code tables and related data saved from one compile to another are maintained in virtual memory rather than in files so that speed of operation is enhanced. Also, the object code created is maintained in memory rather than in a file, and executed from this memory image, to reduce delays. A virtual memory management arrangement for the system assures that all of the needed data modules and code is present in real memory by page swapping, but with a minimum of page faults, again to enhance operating speed.

An overview of the compilation process

The sequence of commands executed by a single invocation of GCC consists of the following stages:

preprocessing (to expand macros) compilation (from source code to assembly language) assembly (from assembly language to machine code) linking (to create the final executable)

As an example, we will examine these compilation stages individually using the Hello World program ‘hello.c’:

#include <stdio.h>

intmain (void){ printf ("Hello, world!\n"); return 0;}

Note that it is not necessary to use any of the individual commands described in this section to compile a program. All the commands are executed automatically and transparently by GCC internally, and can be seen using the -v option described earlier.

Although the Hello World program is very simple it uses external header files and libraries, and so exercises all the major steps of the compilation process.

The Compilation Process

Stages from Source to Executable

1. Compilation: source code ==> relocatable object code (binaries)2. Linking: many relocatable binaries (modules plus libraries) ==> one relocatable binary (with all

external references satisfied)3. Loading: relocatable ==> absolute binary (with all code and data references bound to the

addresses occupied in memory)4. Execution: control is transferred to the first instruction of the program

At compile time (CT), absolute addresses of variables and statement labels are not known.

In static languages (such as Fortran), absolute addresses are bound at load time (LT).

In block-structured languages, bindings can change at run time (RT).

Phases of the Compilation Process

1. Lexical analysis (scanning): the source text is broken into tokens.2. Syntactic analysis (parsing): tokens are combined to form syntactic structures, typically

represented by a parse tree.

The parser may be replaced by a syntax-directed editor, which directly generates a parse tree as a product of editing.

3. Semantic analysis: intermediate code is generated for each syntactic structure.

Type checking is performed in this phase. Complicated features such as generic declarations and operator overloading (as in Ada and C++) are also processed.

4. Machine-independent optimization: intermediate code is optimized to improve efficiency.5. Code generation: intermediate code is translated to relocatable object code for the target

machine.6. Machine-dependent optimization: the machine code is optimized.

On some systems (e.g., C under Unix), the compiler produces assembly code, which is then translated by an assembler.

Text editor

An example of a text editor; Vim.

A text editor is a type of program used for editing plain text files.

Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code.

Plain text files vs. word processor files

There are important differences between plain text files created by a text editor, and document files created by word processors such as Microsoft Word, WordPerfect, or OpenOffice.org. Briefly:

A plain text file is represented and edited by showing all the characters as they are present in the file. The only characters usable for 'mark-up' are the control characters of the used character set; in practice this is newline, tab and formfeed. The most commonly used character set is ASCII, especially recently, as plain text files are more used for programming and configuration and less frequently used for documentation than in the past.

Documents created by a word processor generally contain fileformat-specific "control characters" beyond what is defined in the character set. These enable functions like bold, italic, fonts, columns, tables, etc. These and other common page formatting symbols were once associated only with desktop publishing but are now commonplace in the simplest word processor.

Word processors can usually edit a plain text file and save in the plain text file format. However one must take care to tell the program that this is what is wanted. This is especially important in cases such as source code, HTML, and configuration and control files. Otherwise the file will

http://en.wikipedia.org/wiki/Desktop_publishing

http://en.wikipedia.org/wiki/Control_character


http://en.wikipedia.org/wiki/ASCII

http://en.wikipedia.org/wiki/Character_set


http://en.wikipedia.org/wiki/OpenOffice.org

http://en.wikipedia.org/wiki/WordPerfect

http://en.wikipedia.org/wiki/Microsoft_Word

http://en.wikipedia.org/wiki/Word_processor

http://en.wikipedia.org/wiki/Document


http://en.wikipedia.org/wiki/Programming_language

http://en.wikipedia.org/wiki/Configuration_file

http://en.wikipedia.org/wiki/Operating_system

http://en.wikipedia.org/wiki/Text_file

http://en.wikipedia.org/wiki/Software_application

http://en.wikipedia.org/wiki/Vim_(text_editor)

http://en.wikipedia.org/wiki/File:Vim.png

contain those "special characters" unique to the word processor's file format and will not be handled correctly by the utility the files were intended for.

History

A box of punch cards with several program decks.

Before text editors existed, computer text was punched into punched cards with keypunch machines. The text was carried as a physical box of these thin cardboard cards, and read into a card-reader.

The first text editors were line editors oriented on typewriter style terminals and they did not provide a window or screen-oriented display. They usually had very short commands (to minimize typing) that reproduced the current line. Among them were a command to print a selected section(s) of the file on the typewriter (or printer) in case of necessity. An "edit cursor", an imaginary insertion point, could be moved by special commands that operated with line numbers of specific text strings (context). Later, the context strings were extended to regular expressions. To see the changes, the file needed to be printed on the printer. These "line-based text editors" were considered revolutionary improvements over keypunch machines. In case typewriter-based terminals were not available, they were adapted to keypunch equipment. In this case the user needed to punch the commands into the separate deck of cards and feed them into the computer in order to edit the file.

When computer terminals with video screens became available, screen-based text editors became common. One of the earliest "full screen" editors was O26 - which was written for the operator console of the CDC 6000 series machines in 1967. Another early full screen editor is vi. Written in the 1970s, vi is still a standard editor for Unix and Linux operating systems. The productivity of editing using full-screen editors (compared to the line-based editors) motivated many of the early purchases of video terminals.

http://en.wikipedia.org/wiki/Linux

http://en.wikipedia.org/wiki/Unix

http://en.wikipedia.org/wiki/Vi

http://en.wikipedia.org/wiki/CDC_6000_series

http://en.wikipedia.org/wiki/O26_(text_editor)

http://en.wikipedia.org/wiki/Computer_terminal

http://en.wikipedia.org/wiki/Regular_expression

http://en.wikipedia.org/wiki/Regular_expression

http://en.wikipedia.org/wiki/String_(computer_science)

http://en.wikipedia.org/wiki/Computer_printer

http://en.wikipedia.org/wiki/Typewriter

http://en.wikipedia.org/wiki/Keypunch

http://en.wikipedia.org/wiki/Punched_cards

http://en.wikipedia.org/wiki/File:PunchCardDecks.agr.jpg

Types of text editors

Some text editors are small and simple, while others offer a broad and complex range of functionality. For example, Unix and Unix-like operating systems have the vi editor (or a variant), but many also include the Emacs editor. Microsoft Windows systems come with the very simple Notepad, though many people—especially programmers—prefer to use one of many other Windows text editors with more features. Under Apple Macintosh's classic Mac OS there was the native SimpleText, which was replaced under OSX by TextEdit. Some editors, such as WordStar, have dual operating modes allowing them to be either a text editor or a word processor.

Text editors geared for professional computer users place no limit on the size of the file being opened. In particular, they start quickly even when editing large files, and are capable of editing files that are too large to fit the computer's main memory. Simpler text editors often just read files into an array in RAM. On larger files this is a slow process, and very large files often do not fit.

The ability to read and write very large files is needed by many professional computer users. For example, system administrators may need to read long log files. Programmers may need to change large source code files, or examine unusually large texts, such as an entire dictionary placed in a single file.

Some text editors include specialized computer languages to customize the editor (programmable editors). For example, Emacs can be customized by programming in Lisp. These usually permit the editor to simulate the keystroke combinations and features of other editors, so that users do not have to learn the native command combinations.

Another important group of programmable editors use REXX as their scripting language. These editors permit entering both commands and REXX statements directly in the command line at the bottom of the screen (can be hidden and activated by a keystroke). These editors are usually referred to as "orthodox editors", and most representatives of this class are derivatives of Xedit, IBM's editor for VM/CMS. Among them are THE, Kedit, SlickEdit, X2, Uni-edit, UltraEdit, and Sedit. Some vi derivatives such as Vim also support folding as well as macro languages, and have a command line at the bottom for entering commands. They can be considered another branch of the family of orthodox editors.

Many text editors for software developers include source code syntax highlighting and automatic completion to make programs easier to read and write. Programming editors often permit one to select the name of a subprogram or variable, and then jump to its definition and back. Often an auxiliary utility like ctags is used to locate the definitions.

Typical features of text editors

Search and replaceMain article: String searching algorithm

http://en.wikipedia.org/wiki/String_searching_algorithm

http://en.wikipedia.org/wiki/Ctags

http://en.wikipedia.org/wiki/Syntax_highlighting

http://en.wikipedia.org/wiki/Vim_(text_editor)

http://en.wikipedia.org/wiki/UltraEdit

http://en.wikipedia.org/wiki/SlickEdit

http://en.wikipedia.org/wiki/The_Hessling_Editor

http://en.wikipedia.org/wiki/VM/CMS

http://en.wikipedia.org/wiki/Xedit

http://en.wikipedia.org/wiki/REXX

http://en.wikipedia.org/wiki/Lisp_(programming_language)


http://en.wikipedia.org/wiki/Random_access_memory

http://en.wikipedia.org/wiki/WordStar

http://en.wikipedia.org/wiki/TextEdit

http://en.wikipedia.org/wiki/OSX

http://en.wikipedia.org/wiki/SimpleText

http://en.wikipedia.org/wiki/Mac_OS

http://en.wikipedia.org/wiki/Apple_Macintosh

http://en.wikipedia.org/wiki/Category:Windows_text_editors

http://en.wikipedia.org/wiki/Notepad

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/Emacs

http://en.wikipedia.org/wiki/Vi

http://en.wikipedia.org/wiki/Unix

search string with a replacement string. Different methods are employed, Global(ly) Search And Replace, Conditional Search and Replace, Unconditional Search and Replace.

Cut, copy, and pasteMain article: Cut, copy, and paste

Most text editors provide methods to duplicate and move text within the file, or between files.

Text formattingMain article: Text formatting

Text editors often provide basic formatting features like line wrap, auto-indentation, bullet list formatting, comment formatting, and so on.

Undo and redoMain article: Undo

As with word processors, text editors will provide a way to undo and redo the last edit. Often—especially with older text editors—there is only one level of edit history remembered and successively issuing the undo command will only "toggle" the last change. Modern or more complex editors usually provide a multiple level history such that issuing the undo command repeatedly will revert the document to successively older edits. A separate redo command will cycle the edits "forward" toward the most recent changes. The number of changes remembered depends upon the editor and is often configurable by the user.

ImportingMain article: Data transformation

Reading or merging the contents of another text file into the file currently being edited. Some text editors provide a way to insert the output of a command issued to the operating system's shell.

FilteringMain article: Filter (software)

Some advanced text editors allow you to send all or sections of the file being edited to another utility and read the result back into the file in place of the lines being "filtered". This, for example, is useful for sorting a series of lines alphabetically or numerically, doing mathematical computations, and so on.

Syntax HighlightingMain article: Syntax highlighting

http://en.wikipedia.org/wiki/Syntax_highlighting

http://en.wikipedia.org/wiki/Filter_(software)

http://en.wikipedia.org/wiki/Shell_(computing)

http://en.wikipedia.org/wiki/Data_transformation

http://en.wikipedia.org/wiki/Undo

http://en.wikipedia.org/wiki/Comment_(computer_programming)

http://en.wikipedia.org/wiki/Bullet_list

http://en.wikipedia.org/wiki/Auto-indentation

http://en.wikipedia.org/wiki/Line_wrap

http://en.wikipedia.org/wiki/Text_formatting

http://en.wikipedia.org/wiki/Cut,_copy,_and_paste

Another useful feature of many text editors is syntax highlighting, where the editor can recognise or be instructed that you are writing in a particular language, such as HTML or C++, and can colour code your code for you, to break up text and easily identify tags, etc.

Special features

Some editors include special features and extra functions, for instance,

Source code editors are text editors with additional functionality to facilitate the production of source code. These often feature user-programmable syntax highlighting, and coding tools or keyboard macros similar to an HTML editor (see below).

Folding editors . This subclass includes so-called "orthodox editors" that are derivatives of Xedit. The specialized version of folding is usually called outlining (see below).

IDEs (integrated development environments) are designed to manage and streamline larger programming projects. They are usually only used for programming as they contain many features unnecessary for simple text editing.

World Wide Web programmers are offered a variety of text editors dedicated to the task of web development. These create the plain text files that deliver web pages. HTML editors include: Dreamweaver, E (text editor), Frontpage, HotDog, Homesite, Nvu, Tidy, GoLive, and BBedit. Many offer the option of viewing a work in progress on a built-in web browser.

Mathematicians, physicists, and computer scientists often produce articles and books using TeX or LaTeX in plain text files. Such documents are often produced by a standard text editor, but some people use specialized TeX editors.

Outliners . Also called tree-based editors, because they combine a hierarchical outline tree with an text editor. Folding (see above) can generally be considered a generalized form of outlining.

Debug Monitors

A debug monitor is very powerful graphical or console mode tool that monitors all the activities that are handled by the WinDriver Kernel. You can use the debug monitor to see how each command that is sent to the kernel is executed.

A WinDriver Kernel is a driver development toolkit inside ones computer that simplifies the creation of drivers. A driver is used in a computer so that the computer can read the devices that are in the computer or that get attached to the computer. If you were to hook up a printer to your computer, you would first need to install its driver so that the computer could create graphics or a console so that you could control your printer through the computer. The same thing goes for audio devices, internet devices, video devices.

A debug monitor, simply put, is a tool that helps to find and reduce the number of bugs and defects in a computer program or any electrical device within or attached to the computer in order to make it act the way it should. While the driver is being created and downloaded, the debug monitor helps it work properly. For example, when an armored car drives up to a bank and the guards have to transfer money from the truck to the bank, there are special guards that stand watch to make sure no one tries to rob them thus making the transaction go smoothly. Those guards could be the debug monitors in the computer industry.

http://en.wikipedia.org/wiki/Outliner

http://en.wikipedia.org/wiki/TeX#Editors

http://en.wikipedia.org/wiki/LaTeX

http://en.wikipedia.org/wiki/TeX

http://en.wikipedia.org/wiki/Browsers

http://en.wikipedia.org/wiki/BBedit

http://en.wikipedia.org/wiki/GoLive

http://en.wikipedia.org/w/index.php?title=Tidy_(Text_editor)&action=edit&redlink=1

http://en.wikipedia.org/wiki/Nvu

http://en.wikipedia.org/wiki/Homesite

http://en.wikipedia.org/wiki/HotDog

http://en.wikipedia.org/wiki/Frontpage

http://en.wikipedia.org/wiki/E_(text_editor)

http://en.wikipedia.org/wiki/Dreamweaver

http://en.wikipedia.org/wiki/HTML_editor

http://en.wikipedia.org/wiki/Web_development

http://en.wikipedia.org/wiki/Web_development

http://en.wikipedia.org/wiki/World_Wide_Web

http://en.wikipedia.org/wiki/Integrated_development_environment

http://en.wikipedia.org/wiki/Folding_editor

http://en.wikipedia.org/wiki/Macro_(computer_science)

http://en.wikipedia.org/wiki/Source_code_editor

http://en.wikipedia.org/wiki/C%2B%2B

http://en.wikipedia.org/wiki/HTML

If the debugging monitor locates a bug or defect in any of the equipment, it will first try to reproduce the problem which will allow a programmer to view each string that was within the bug or defect range and try to fix it. A programmer is a technician who has learned the basic format of computers that make them run. These are strings of technical information that most people using computers will never see. For example, using a clock. The general public will plug in the clock and use it to tell time but will not open it up to see how it works. That is saved for the people who fix clocks. They are the programmers of clocks in the computer industry.

The programmer will delete strings or add new ones and then use the debug monitor to re-create the driver download to see if he fixed the problem. This can be a tedious task with all the processes that run in the computer, but the debug monitor helps to make it a lot easier.

Assemblers:

An assembler is a translator that translates source instructions(in symbolic language) into target instructions(in machine language), on a one to one basis.This means that each source instruction is translated into exactly one target instruction.This definition has the advantage of clearly describing the translation processof an assembler. It is not a precise definition, however, because an assembler cando (and usually does) much more than just translation. It offers a lot of help tothe programmer in many aspects of writing the program. The many types of helpOffered by the assembler are grouped under the general term directives (or pseudo instructions).

_ Another good definition of assemblers is:An assembler is a translator that translates a machine oriented language into machine language.Obviously, future symbols are not an error and their use should not be prohibited.The programmer should be able to refer to source lines which either precedeor follow the current line. Thus the future symbol problem has to be solved. Itturns out to be a simple problem and there are two solutions, a one-pass assemblerand a two-pass assembler. They represent not just different solutions to the futuresymbol problem but two different approaches to assembler design and operation.The one-pass assembler, as the name implies, solves the future symbol problemby reading the source file once. Its most important feature, however, is that itdoes not generate a relocatable object file but rather loads the object code (themachine language program)directly into memory. Similarly, the most importantfeature of the two-pass assembler is that it generates a relocatable object file, thatis later loaded into memory by a loader. It also solves the future symbol problemby performing two passes over the source file. It should be noted at this point thata one-pass assembler can generate an object file. Such a file, however, would beabsolute, rather than relocatable, and its use is limited. Absolute and relocatableobject files are discussed later in this chapter. Figure is a summary of the most

important components and operations of an assembler.

The Main Components andOp erations of an Assembler.

The Two-Pass AssemblerA two-pass assembler is easier to understand and will be discussed first. Suchan assembler performs two passes over the source file. In the first pass it reads theentire source file, looking only for label definitions. All labels are collected, assignedvalues, and placed in the symbol table in this pass. No instructions are assembledand, at the end of the pass, the symbol table should contain all the labels defined inthe program. In the second pass, the instructions are again read and are assembled,using the symbol table. Exercise What if a certain symbol is needed in pass 2, to assemble an instruction,and is not found in the symbol table?_ To assign values to labels in pass 1, the assembler has to maintain the LC. Thisin turn means that the assembler has to determine the size of each instruction (inwords), even though the instructions themselves are not assembled.In many cases it is easy to figure out the size of an instruction. On the IBM 360,the mnemonic determines the size uniquely. An assembler for this machine keepsthe size of each instruction in the OpCode table together with the mnemonic andthe OpCode (see table 1–1). On the DEC PDP-11 the size is determined bothby the type of the instruction and by the addressing mode(s)that it uses. Mostinstructions are one word (16-bits)long. However, if they use either the index orindex deferred modes, one more word is added to the instruction. If the instructionhas two operands (source and destination)b oth using those modes, its size will be

3 words. On most modern microprocessors, instructions are between 1 and 4 byteslong and the size is determined by the OpCode and the specific operands used.This means that, in many cases, the assembler has to work hard in the firstpass just to determine the size of an instruction. It has to look at the mnemonicand, sometimes, at the operands and the modes, even though it does not assemblethe instruction in the first pass. All the information about the mnemonic andthe operand collected by the assembler in the first pass is extremely useful in thesecond pass, when instructions are assembled. This is why many assemblers saveall the information collected during the first pass and transmit it to the second passthrough an intermediate file. Each record on the intermediate file contains a copyof a source line plus all the information that has been collected about that line inthe first pass. At the end of the first pass the original source file is closed and is nolonger used. The intermediate file is reopened and is read by the second pass as itsinput file. A record in a typical intermediate file contains:The record type. It can be an instruction, a directive, a comment, or an invalidline. The LC value for the line. A pointer to a specific entry in the OpCode table or the directive table. Thesecond pass uses this pointer to locate the information necessary to assemble orexecute the line. The Two-Pass Assembler 21A copy of the source line. Notice that a label, if any, is not use by pass 2 butmust be included in the intermediate file since it is needed in the final listing.Fig. 1–2 is a flow chart summarizing the operations in the two passes.There can be two problems with labels in the first pass; multiply-defined labelsand invalid labels. Before a label is inserted into the symbol table, the table has tobe searched for that label. If the label is already in the table, it is doubly (or evenmultiply-)defined. The assembler should treat this label as an error and the bestway of doing this is by inserting a special code in the type field in the symbol table.Thus a situation such as:AB ADD 5,X..AB SUB 6,Y..JMP ABwill generate the entry:name value typeAB — MTDFin the symbol table.Labels normally have a maximum size (typically 6 or 8 characters), must startwith a letter, and may only consist of letters, digits, and a few other characters.Labels that do not conform to these rules are invalid labels and are normally considereda fatal error. However, some assemblers will truncate a long label to themaximum size and will issue just a warning, not an error, in such a case. Exercise What is the advantage of allowing characters other than letters anddigits in a label?The only problem with symbols in the second pass is bad symbols. These areeither multiply-defined or undefined symbols. When a source line uses a symbol inthe operand field, the assembler looks it up in the symbol table. If the symbol isfound but has a type of MTDF, or if the symbol is not found in the symbol table (i.e.,it has not been defined), the assembler responds as follows.It flags the instruction in the listing file.It assembles the instruction as far as possible, and writes it on the object file. It flags the entire object file. The flag instructs the loader not to start executionof the program. The object file is still generated and the loader will read and load

it, but not start it. Loading such a file may be useful if the user wants to see amemory map

The Operations of the Two-Pass Assmbler.

The JMP AB instruction above is an example of a bad symbol in the operandfield. This instruction cannot be fully assembled, and thus constitutes our firstexample of a fatal error detected and issued by the assembler.The last important point regarding a two-pass assembler is the box, in the flowchart above, that says write object instruction onto the object file. The point is thatwhen the two-pass assembler writes the machine instruction on the object file, it hasaccess to the source instruction. This does not seem to be an important point but,in fact, it constitutes the main difference between the one-pass and the two-pass assemblers. This point is the reason why a one-pass assembler can only produce an absolute object file (which has only limited use), whereas a two-pass assembler

can produce a relocatable object file, which is much more general.

The Operations of the Two-Pass Assmbler.

The One-Pass AssemblerThe operation of a one-pass assembler is different. As its name implies, thisassembler reads the source file once. During that single pass, the assembler handlesboth label definitions and assembly. The only problem is future symbols and, tounderstand the solution, let’s consider the following example:LC36 BEQ AB ;BRANCH ON EQUAL..67 BNE AB ;BRANCH ON NOT EQUAL..89 JMP AB ;UNCONDITIONALLY..126 AB anythingSymbol AB is used three times as a future symbol. On the first reference, whenthe LC happens to stand at 36, the assembler searches the symbol table for AB, doesnot find it, and therefore assumes that it is a future symbol. It then inserts AB intothe symbol table but, since AB has no value yet, it gets a special type. Its type isU (undefined). Even though it is still undefined, it now occupies an entry in thesymbol table, an entry that will be used to keep track of AB as long as it is a futuresymbol. The next step is to set the ‘value’ field of that entry to 36 (the currentvalue of the LC). This means that the symbol table entry for AB is now pointingto the instruction in which AB is needed. The ‘value’ field is an ideal place for thepointer since it is the right size, it is currently empty, and it is associated withAB. The BEQ instruction itself is only partly assembled and is stored, incomplete,in memory location 36. The field in the instruction were the value of AB should bestored (the address field), remains empty.When the assembler gets to the BNE instruction (at which point the LC standsat 67), it searches the symbol table for AB, and finds it. However, AB has a typeof U, which means that it is a future symbol and thus its ‘value’ field (=36)is nota value but a pointer. It should be noted that, at this point, a type of U does notnecessarily mean an undefined symbol. While the assembler is performing its singlepass, any undefined symbols must be considered future symbols. Only at the end ofthe pass can the assembler identify undefined symbols (see below). The assemblerhandles the BNE instruction by:Partly assembling it and storing it in memory location 67.Copying the pointer 36 from the symbol table to the partly assembled instructionin location 67. The instruction has an empty field (where the value of AB shouldhave been), where the pointer is now stored. There may be cases where this fieldSec. 1.3 The One-Pass Assembler 25in the instruction is too small to store a pointer. In such a case the assembler mustresort to other methods, one of which is discussed below.Copying the LC (=67)in to the ‘value’ field of the symbol table entry for AB,rewriting the 36.

When the assembler reaches the JMP AB instruction, it repeats the three stepsabove. The situation at those three points is summarized below.memory symbol memory symbol memory symboltable table tableloc contents n v t loc contents n v t loc contents n v t36 BEQ - . 36 BEQ - . 36 BEQ - .. . . . . .. AB 36 U . AB 67 U . AB 89 U. 67 BNE 36 . 67 BNE 36 .. . . ..89 JMP 67It is obvious that an indefinite number of instructions can refer to AB as a futuresymbol. The result will be a linked list linking all these instructions. When thedefinition of AB is finally found (the LC will be 126 at that point), the assemblersearches the symbol table for AB and finds it. The ‘type’ field is still U which tellsthe assembler that AB has been used as a future symbol. The assembler then followsthe linked list of instructions using the pointers found in the instructions. It startsfrom the pointer found in the symbol table and, for each instruction in the list, theassembler:saves the value of the pointer found in the address field of the instruction. Thepointer is saved in a register or a memory location (‘temp’ in the figure below), andis later used to find the next incomplete instruction.Stores the value of AB (=126)in the address field of the instruction, therebycompleting it.The last step is to store the value 126 in the ‘value’ field of AB in the symboltable, and to change the type to D. The individual steps taken by the assembler inour example are shown in the table below.It, therefore, follows that at the end of the single pass, the symbol table shouldonly contain symbols with a type of D. At the end of the pass, the assembler scansthe symbol table for undefined symbols. If it finds any symbols with a type of U, itissues an error message and will not start the program.Figure 1–3 is a flow chart of a one-pass assembler.The one-pass assembler loads the machine instructions in memory and thushas no trouble in going back and completing instructions. However, the listinggenerated by such an assembler is incomplete since it cannot backspace the listing26 Basic Principles Ch. 1Address Contents Contents Contents36 BEQ - BEQ - BEQ 126..67 BNE 36 BNE 126 BNE 126..89 JMP 126 JMP 126 JMP 126temp=67 temp=36 temp=/Step 1 Step 2 Step 3file to complete lines previously printed. Therefore, when an incomplete instruction(one that uses a future symbol)is loaded in memory, it also goes into the listingfile as incomplete. In the example above, the three lines using symbol AB will beprinted with asterisks ‘*’ or question marks ‘?’, instead of the value of AB._ The key to the operation of a one-pass assembler is the fact that it loads theobject code directly in memory and does not generate an object file. This makes itpossible for the assembler to go back and complete instructions in memory at anytime during assembly.The one-pass assembler can, in principle, generate an object file by simplywriting the object program from memory to a file. Such an object file, however,

would be absolute. Absolute and relocatable object files are discussed below.One more point needs to be mentioned here. It is the case where the addressfield in the instruction is too small for a pointer. This is a common case, sincemachine instructions are designed to be short and normally do not contain a fulladdress. Instead of a full address, a typical machine instruction contains two fields,mode and displacement (or offset), such that the mode tells the computer how toobtain the full address from the displacement (see appendix A). The displacementfield is small (typically 8–12 bits)and has no room for a full address.To handle this situation, the one-pass assembler has an additional data structure,a collection of linked lists, each corresponding to a future symbol. Each linkedlist contains, in its nodes, pointers to instructions that are waiting to be completed.The list for symbol AB is shown below in three successive stages of its construction.When symbol AB is found, the assembler uses the information in the list tocomplete all incomplete instructions. It then returns the entire list to the pool ofavailable memory.

As the name implies, these are assemblers for high-level assembler languages.Such languages are rare, there is no general agreement on how to define them, onwhat their main features should be, and on whether they are useful and shouldbe developed at all. Existing high-level assemblers differ in many respects and, onlooking at several of them, two possible definitions emerge:

A high-level assembler language (HLA) is a programminglanguage where each instruction is translated into a few machine instructions. The translator issomewhat more complex than an assembler, but much simpler than a compiler. Such a language should not have features like the if, for, and case control structures, complex arithmetic, logical expressions, andmulti-dimensional arrays. It should consist of simple instructions, closely resembling traditional assemblerinstructions, and of a few simple data types. A high-level assembler language (HLA) is a language that combines most of the features of higher-levellanguages (easy to use control structures, variables, scope, data types, blockstructure) with one important feature of assembler languages namely, machinedependence. One may argue that the second definition defines a machine dependent higher-levellanguage rather than a high-level assembler language, the main reason being the definition of assembler language, given in the introduction. The basis of this definition is the one-to-one correspondence between assembler- and machine instructions. The languages discussed here do not have such a one-to-one correspondence and, in this respect, resemble more a higher-level language than an assembler language.Symbol Table:

The organization of the symbol table is the key to fast assembly. Even whenworking on a small program, the assembler may use the symbol table hundredsof times and, consequently, an efficient implementation of the table can cut theassembly time significantly even for short programs._ The symbol table is a dynamic structure. It starts empty and should supporttwo operations, insertion and search. In a two-pass assembler, insertions are doneonly in the first pass and searches, only in the second. In a one-pass assembler,both insertions and searches occur in the single pass. The symbol table does nothave to support deletions, and this fact affects the choice of data structure forimplementing the table. A symbol table can be implemented in many differentways but the following methods are almost always used, and will be discussed here:

A linear array.A sorted array with binary search.Buckets with linked lists.A binary search tree.A hash table.

A Linear ArrayThe symbols are stored in the first N consecutive entries of an array, and anew symbol is inserted into the table by storing it in the first available entry (entryN + 1) of the array. A typical Pascal code for such an array would be:var symtab: recordN: 0..lim;tabl: array[0..lim] of recordname: string;valu: integer;type: char;end;end;Where lim is some suitable constant. The variable N is initially set to zero, and italways points to the last entry in the array. An insertion is done by:Testing to make sure that N <lim (the symbol table is not full). IncrementingN by 1. Inserting the name, value, and type into the three fields, using N as anindex.The insertion takes fixed time, independent of the number of symbols in the table.To search, the array of names is scanned entry by entry. The number of stepsinvolved varies from a minimum of 1 to a maximum of N. Every search for a nonexistentsymbol involves N steps, thus a program with many undefined symbolswill be slow to assemble because the average search time will be high. Assuming aprogram with only a few undefined symbols, the average search time is N/2. In atwo-pass assembler, insertions are only done in the first pass so, at the end of thatpass, N is fixed. All searches in the second pass are performed in a fixed table.In a one-pass assembler, N grows during the pass, and thus each search takes anaverage of N/2 steps, but the values of N are different.Advantages: Fast insertion. Simple operations.Disadvantages: Slow search, specially for large values of N. Fixed size.2.2 A Sorted Array

The same as a linear array, but the array (actually, the three arrays) is sorted,by name, after the first pass is completed. This, of course, can only be done in atwo-pass assembler. To find a symbol in such a table, binary search is used, whichtakes (see, for example, reference [15]) an average of log2 N steps. The differencebetween N and log2 N is small when N is small but, for large values of N, thedifference can get large enough to justify the additional time spent on sorting thetable.Advantages: Fast insertion and fast search. Since the table is already sorted, thepreparation of a cross-reference listing (see chapter 5) is simplified.Disadvantages: The sort takes time, which makes this method useful only for alarge number of symbols (at least a few hundred).Sec. 2.3 Buckets with Linked Lists 61

2.3 Buckets with Linked Lists

An array of 26 entries is declared, to serve as the start of the buckets. Each

entry points to a bucket that is a linked list of all those symbols that start withthe same letter. Thus all the symbols that start with a ‘C’ are linked together ina list that can be reached by following the pointer in the third entry of the array.Initially all the buckets are empty (all pointers in the array are null). As symbolsare inserted, each bucket is kept sorted by symbol name. Notice that there is noneed to actually sort the buckets. The buckets are kept in sorted order by carefullyinserting each new symbol into its proper place in the bucket. When a new symbolis presented, to be inserted in a bucket, the bucket is first located by using the firstcharacter in the symbol’s name (one step). The symbol is then compared to thefirst symbol in the bucket (the symbol names are compared). If the new symbolis less (in lexicographic order) than the first, the new one becomes the first inthe bucket. Otherwise, the new symbol is compared to the second symbol in thebucket, and so on. Assuming an even distribution of names over the alphabet, eachbucket contains an average of N/26 symbols, and the average insertion time is thus1 + (N/26)/2 = 1+N/52. For a typical program with a few hundred symbols, theaverage insertion requires just a few steps.A search is done by first locating the bucket (one step), and then performingthe same comparisons as in the insertion process above. The average search thusalso takes 1 + N/52 steps.Such a symbol table has a variable size. More nodes can be allocated and addedto the buckets, and the table can, in principle, use the entire available memory.Advantages: Fast operations. Flexible table size.Disadvantages: Although the number of steps is small, each step involves the useof a pointer and is therefore slower than a step in the previous methods (that usearrays). Also, some programmers always tend to assign names that start with an A.In such a case all the symbols will go into the first bucket, and the table will behaveessentially as a linear array.Such an implementation is recommended only if the assembler is designed toassemble large programs, and the operating system makes it convenient to allocatestorage for list nodes. Exercise 2.1 What if symbol names can start with a character other than a letter?Can this data structure still be used? If yes, how?BGHA345 J12CC MEDZIPTOMQUEPETSON62 The Symbol Table Ch. 2

2.4 A Binary Search Tree

This is a general data structure used not just for symbol tables, and is quiteefficient. It can be used by either a one pass or two pass assembler with the sameefficiency.The table starts as an empty binary tree, and the first symbol inserted intothe table becomes the root of the tree. Every subsequent symbol is inserted intothe table by (lexicographically) comparing it with the root. If the new symbol isless than the root, the program moves to the left son of the root and compares thenew symbol with that son. If the new symbol is greater than the root, the programmoves to the right son of the root and compares as above. If the new symbol turnsout to be equal to any of the existing tree nodes, then it is a doubly-defined symbol.

Otherwise, the comparisons continue until a node is reached that does not have ason. The new symbol becomes the (left or right) son of that node.Example: Assuming that the following symbols are defined, in this order, ina program.BGH J12 MED CC ON TOM A345 ZIP QUE PETSSymbol BGH becomes the root of the tree, and the final binary search tree is. A Binary Search TreeA Binary Search Tree is a good source for binary search trees and it also discusses theaverage times for insertion, search, and deletion (which, in the case of a symboltable, is unnecessary). The minimum number of steps for insertion or search isobviously 1. The maximum number of steps depends on the height of the tree. Thetree in Fig. 2–1 above has a height of 7, so the next insertion will require from 1to 7 steps. The height of a binary tree with N nodes varies between log2 N (whichis the height of a fully balanced tree), and N (the height of a skewed tree). It canbe proved that an average binary tree is closer to a balanced tree than to a skewedtree, and this implies that the average time for insertion or search in a binary searchtree is of the order of log2 N.Advantages: Efficient operation (as measured by the average number of steps).Flexible size.Disadvantages: Each step is more complex than in an array-based symbol table.The recommendations for use are the same as for the previous method.2.5 A Hash Table

This method comes in two varieties, open hash, which uses pointers and hasa variable size, and closed hash, which is a fixed-size array.2.5.1 Closed hashingA closed hash table is an array (actually three arrays, for the name, value, andtype), normally of size 2N, where each symbol is stored in an entry. To insert anew symbol, it is necessary to obtain an index to the entry where the symbol willbe stored. This is done by performing an operation on the name of the symbol, anoperation that results in an N-bit number. An N-bit number has a value between0 and 2N − 1 and can thus serve as an index to the array. The operation is calledhashing and is done by hashing, or scrambling, the bits that constitute the nameof the symbol. For example, consider 6-character names, such as abcdef. Eachcharacter is stored in memory as an 8-bit ASCII code. The name is divided intothree groups of two characters (16-bits) each, ab cd ef. The three groups areadded, producing an 18-bit sum. The sum is split into two 9-bit halves whichare then multiplied to give an 18-bit product. Finally N bits are extracted fromthe middle of the product to serve as the hash index. The hashing operations aremeaningless since they operate on codes of characters, not on numbers. However,they produce an N-bit number that depends on all the bits of the original name.A good hash function should have the following two properties:It should consider all the bits in the original name. Thus when two names that areslightly different are hashed, there should be a good chance of producing differenthash indexes.For a group of names that are uniformly distributed over the alphabet, the functionshould produce indexes uniformly distributed over the range 0 . . . 2N − 1.Once the hash index is produced, it is used to insert the symbol into the array.Searching for symbols is done in an identical way. The given name is hashed, andthe hashed index is used to retrieve the value and the type from the array.64 The Symbol Table Ch. 2Ideally, a hash table requires fixed time for insert and search, and can bean excellent choice for a large symbol table. There are, however, two problemsassociated with this method namely, collisions and overflow, that make hash tables

less than ideal.Collisions involve the case where two entirely different symbol names are hashedinto identical indexes. Names such as SYMB and ZWYG6 can be hashed into the samevalue, say, 54. If SYMB is encountered first in the program, it will be insertedinto entry 54 of the hash table. When ZWYG6 is found, it will be hashed, andthe assembler should discover that entry 54 is already taken. The collision problemcannot be avoided just by designing a better hash function. The problem stems fromthe fact that the set of all possible symbols is very large, but any given programuses a small part of it. Typically, symbol names start with a letter, and consist ofletters and digits only. If such a name is limited to six characters, then there are26 × 365 (≈ 1.572 billion) possible names. A typical program rarely contains morethan, say, 500 names, and a hash table of size 512 (= 29) may be sufficient. When1.572 billion names are mapped into 512 positions, more than 3 million names willmap into each position. Thus even the best hash function will generate the sameindex for many different names, and a good solution to the collision problem is thekey to an efficient hash table.The simplest solution involves a linear search. All entries in the symbol tableare originally marked as vacant. When the symbol SYMB is inserted into entry 54,that entry is marked occupied. If symbol ZWYG6 should be inserted into entry 54and that entry is occupied, the assembler tries entries 55, 56 and so on. This impliesthat, in the case of a collision, the hash table degrades to a linear table.Another solution involves trying entry 54 + P where P and the table size arerelative primes. In either case, the assembler tries until a vacant entry is found oruntil the entire table is searched and found to be all occupied.Morris [16] presents a complete analysis of hash tables, where it is shown thatthe average number of steps to insert (or search for a) symbol is 1/(1 − p) where pis the percent-full of the table. p = 0 corresponds to an empty table, p = 0.5 meansa half-full table, etc. The following table gives the average number of steps for afew values of p.numberp of steps0 1.4 1.66.5 2.6 2.5.7 3.33.8 5.9 10.95 20Sec. 2.5 A Hash Table 65It is clear that when the hash table gets more than 50%–60% full, performancesuffers, no matter how good the hashing function is. Thus a good hash table designmakes sure that the table never gets more than 60% occupied. At that point thetable is considered overflowed.The problem of hash table overflow can be handled in a number of ways. Traditionally,a new, larger table is opened and the original table is moved to the new oneby rehashing each element. The space taken by the original table is then released.Hopgood [17] is a good analysis of this method. A better solution, though, is to useopen hashing.

2.5.2 Open hashingAn open hash table is a structure consisting of buckets, each of which is thestart of a linked list of symbols. It is very similar to the buckets with linked listsdiscussed above. The principle of open hashing is to hash the name of the symboland use the hash index to select a bucket. This is better than using the first

character in the name, since a good hash function can evenly distribute the namesover the buckets, even in cases where many symbols start with the same letter. Ahoet al. [18] presents an analysis of open

LOADERS:

To better understand loaders, some material from previous chapters should bereviewed. The principles of operation of one pass and two pass.These topics discuss three of the four main tasks of a loader namely, loading,relocation, and linking. The fourth task is memory allocation (finding room inmemory for the program). A loader therefore does more than its name implies. Aloader performing all four tasks is called a linking loader. (however, some authorscall it a relocating loader. Perhaps the best name would be a general loader.)A loader that does everything except loading is called a linker (in the UNIVACliterature, it is called a collector, Burroughs calls it a binder and IBM, a linkageeditor ). An absolute loader is one that supports neither relocation nor linking.As a result, loaders come in many varieties, from very simple to very complex,and range in size from very small (a few tens of instructions for a bootstrap loader)to large (thousands of instructions). A few good references for loaders are [1, 3, 46,64, 82].Neither assemblers nor loaders are user programs. They are a part of theoperating system (OS). However, the loader can be intimately tied up with the rest

of the operating system (because of its memory allocation task), while the assembleris more a stand-alone program, having little to do with the rest of the OS.Most of this chapter is devoted to linking loaders, but it starts with two shortsections describing assemble-go loaders and absolute loaders. It ends with a numberof sections devoted to special features of loaders and to special types of loaders.Before we start, here is a comment on the word ‘relocate’. Loaders do notrelocate a program in the sense that they do not move it in memory from onearea to another. The loader may reload the same program in different memoryareas but, once loaded, the program normally is not relocated. There are someexceptions where a program is relocated, at run time, to another memory area but,in general, the term ‘relocate’ is a misnomer. Exercise If it is a misnomer, why do we use it?

7.1 Assemble-Go Loaders

Such a loader is just a part of the one-pass assembler; it is not an independentprogram. The one pass assembler loads each object instruction in memory as it isbeing generated. At the end of the single pass, the entire program is loaded and, inthe absence of any assembler errors, the assembler starts execution of the program.This is done by jumping, or branching, to the first instruction of the program. Theuser can specify a different start address by means of the END directive (Ch. 3), and,in such a case, the assembler will branch to that address.This method of loading is fast and simple but has several important limitations:The assembler has to reside in memory with the object program. This may notconstitute a problem in today’s computers with large memories, but is was a severelimitation in the past, and may still be on a small, personal computer.The assembler has to determine where to locate the program in memory. Mostone pass assemblers are used on small computers, where there is only one programin memory at any time (a single-user computer) and all programs always start atthe same address. Typically the user program is loaded in lower memory locations,

the assembler itself is loaded high in memory (figure 7–1a), and the area occupiedby the assembler can be used by the program for data storage at run time. A one pass assembler used in a large, multi-user computer, has to ask the OSfor an available area in memory, sufficiently large for the program. It has to havean idea of the program’s size before it starts, and an estimate is normally suppliedby the user. Absolute LoadersAn absolute loader is the next step in the hierarchy of loaders. It can loadan absolute object file generated by a one-pass assembler. (Note that some linkageeditors also generate an absolute object file.) This partly solves some of the problemsmentioned above. Still, such a loader is limited in what it can do.An absolute object file consists of three parts:The start address of the program. This is where the loader should start loadingthe program.Sec. 7.2 Absolute Loaders 199The object instructions.The address of the first executable instruction. This is placed in the object fileby the assembler in response to the END directive. It is either the address specifiedby the END or, in the absence of such an address, is identical to the first address ofthe program.The loader reads the first item and loads the rest of the object file into successivememory locations. Its last step is to read item 3 (the address of the first executableinstruction) from the object file, and to branch to that address, in order to startexecution of the program.Library routines are handled by an absolute loader in the same way as by anassemble-go system.It turns out that even a one-pass assembler can, under certain conditions,generate code that will run when loaded in any memory area. This code is calledposition independent and is generated when certain addressing modes are used, orwhen the hardware uses base registers.Addressing modes are described in appendix A. Modes such as direct, immediate,relative, stack, and a few others, generate code that is position independent. Aprogram using only such modes can be loaded and executed starting at any addressin memory, and no relocation is necessary.The use of base registers is not that common but, since they are one of thefew ways for generating position independent code, they are also described in appendixA.Linking LoadersThese are full-feature, general loaders that support the four tasks mentionedearlier. Such a loader can load several object files, relocating each, and linking theminto one executable program. The loader, of course, has access neither to the sourcefile nor to the symbol table. This is why the individual object files must contain allthe information needed by the loader.A word on terminology. In IBM terminology a load module is an absolute objectfile (or something very similar to it), and an object module is a relocatable objectfile. Those terms are discussed in detail in point 7 below.The following is a summary of the main steps performed by such a loader:

1. It reads, from the standard input device, the names of all the object files tobe loaded. Some may be library routines.2. It locates all the object files, opens each and reads the first record. Thisrecord (see figure 7–3b) is a loader directive containing the size of the programwritten in that file. The loader then adds the individual sizes to compute the totalsize of the program. With the OS help, the loader then locates an available memory

area large enough to acommodate the3. The next step is to read the next couple of items from the first object file.These are loader directives, each corresponding to a special symbol (EXTRN orENTRY). This information is loaded in memory in a special symbol table (SST) tobe used later for linking.4. Step 3 is repeated for all remaining object files. After reading all the specialsymbol information from all the object files, the loader scans the SST, mergingitems as described below. This process converts the SST into a global externalsymbol table (GEST). If no errors are discovered during this process, the GEST isready and the loader uses it later to perform linking.5. The loader then reads the rest of the first object file and loads it, relocatinginstructions when necessary. All loader directives found in the file are executed.Any item requiring special relocation is handled as soon as it is read off the file,using information in the GEST. Some of those items may require loading routinesoff libraries (see later in this chapter).6. Step 5 is repeated for all remaining object files. They are read and loadedin the order in which their names were read in step 1.7. The loader generates three outputs. The main output is the loaded program.It is loaded in memory as one executable module where one cannot tell if instructionscame from different object files. In a computer where virtual memory is used, theprogram is physically divided into pages (or logically divided into segments) whichare loaded in different areas of memory. In such a case, the program does not occupya contiguous memory area. Nevertheless, it is considered a single module and it getsexecuted as one unit. Pages and segments are described in any Operating Systemsor Systems Programming text.The second (optional) output of the loader is a listing file with error messages,if any, and a memory map. The memory map contains, for each program, its name,start address, and size. The name is specified by the user in a special directive(IDENT or TITLE) or, in the absence of such a directive, it is the name of the objectfile.The third loader output is also optional and is a single object file for the entireprogram. This file includes all the individual programs after linking, so it does notinclude any linking information, but includes relocation bits. It is called a loadmodule. Such a file can later be loaded by a relocating loader without having todo any linking, which speeds up the loading. Note that a load module is the mainoutput of a linkage editor (see below).The reason for this output is that, in a production environment—where programsare loaded and executed frequently, but rarely need to be reassembled (orrecompiled)—fast load becomes important. In such an environment it makes senseto use two types of loaders. The first is a linker or a linkage editor, which performsjust linking and produces a load module. The second is a simple relocating loaderthat reads and loads a load module, performing just the three tasks of memoryLinking Loaders 201allocation, loading, and relocation. By eliminating linking, the relocating loaderworks fast.On the other hand, when programs are being developed and tested, they haveto be reassembled or recompiled very often. In such a case it makes more sense touse a full-feature loader, which performs all four tasks. Using two loaders wouldbe slower since most runs would involve a new version of the program and wouldnecessitate executing both loaders.Linking can be done at a number of different stages.. It turns out that late linking allows for more flexibility. The latest possible moment to do the linking is at run time. This is the dynamic linking feature discussed later in this chapter. Consider an instruction that requires linking,something like a ‘CALL LB’ instruction, which calls a library routine LB. This instructionis loaded but is not always executed. (Recall that, each time a programis run, different instructions are executed.) Doing the linking at run time has the

advantage that, if the ‘CALL LB’ instruction is not executed, the library routine doesnot have to be loaded. Of course there is a tradeoff. Run time linking requires some of the loaderroutines to reside in memory with the program.

OverlaysMany modern computers use virtual memories that make it possible to runprograms larger than the physical memory. Either one program or several programscan be executed even if the total size is greater than the entire memory available.When a computer does not use virtual memory, running a large program becomes aproblem. One solution is overlays (or chaining), which will be discussed here sinceits implementation involves the loader.Overlays are based on the fact that many programs can be broken into logicalparts such that only one part is needed in memory at any time. The program is216 Loaders Ch. 7divided, by the programmer, into a main part (the overlay root), that resides inmemory during the entire execution, and several overlays (links or segments) thatcan be called, one at a time, by the root, loaded and executed. All the links sharethe same memory area whose size should be the maximum size of the links. A linkmay contain one program or several programs, linked in the usual way. At anygiven time, only the root and one link are active (but see the discussion of sublinksand tree structure below). Two features are needed to implement overlays:A directive declaring the start of each overlay. Those directives are recognized bythe assembler which, in turn, prepares a separate object file for each overlay.A special ‘CALL OVERLAY’ instruction to load an overlay (a link) at run time.Such an instruction calls a special loader routine, the overlay manager, resident inmemory with the main program, which loads the specific overlay from the object fileinto the shared memory area. The last executable instruction in the overlay mustbe a return. It should return to the calling program, which is typically the mainpart, but could also be another overlay. Such a return works either by popping thereturn address fron the stack, or by generating a software interrupt, that transferscontrol to the overlay manager in the OS.A typical directive declaring an overlay is ‘OVERLAY n’ (or ‘LINK n’) where n isthe overlay number. Each such directive directs the assembler to finish the previousassembly, write an object file for the current overlay, and start a new assembly forthe next overlay. The END directive terminates the last link. The result is a numberof object files, the first of which is a regular one, containing the main program. Allthe rest are special, each containing a loader directive declaring it to be an overlayand specifying the number of the overlay.The loader receives the names of all the object files, it loads the first one but,upon opening the other ones, finds that they are overlays. As a result, the otherobject files are not loaded but are left open, accessible to the loader. The loaderuses the maximum size of those files as the size of the shared memory area andloads, following the main program, a routine that can locate and load an overlay.At run time, each ‘CALL OVERLAY[n]’ (or ‘CALL LINK’) instruction, invokes thatroutine which loads the overlay, on top of the previous one, into the shared area.As far as relocating the different overlays, there are two possibilities: The first oneis to relocate each overlay while it is loaded. The other possibility is to preparea pre-relocated (absolute) version of each overlay and load the absolute versions.This requires more load time work but speeds up loading the overlays at run time.Generally, an overlay is a large part of the program and is not loaded many times.In such a case, the first alternative, of relocating the overlay each time it is loaded,seems a better choice.In general, each overlay may be very large, and sub-overlays can be declared.

The result is a program organized as a tree where each branch corresponds to anoverlay, each smaller branch, to a sub-overlay, etc. Figure 7–7 is an example of sucha tree.The table below assumes certain sizes for the different links and a start addressOverlays 217

Figure An Overlay Tree.of 0 for the root A. It then shows the start addresses of each link and the total sizeof the program when that link is loaded.

UNIX:OPRATING SYSTEM

Booting UNIX:Loading the Kernel:Most systems, particularly PCs, implement a two-stage loading process:

The system BIOS loads a small boot program.This small boot program in turns loads the kernel.

On PCs, this small boot program exists within the first 512 bytes of the boot device. This 512-byte segment is called the Master Boot Record. The MBR is what loads the kernel from disk. A popular boot loader used by most Linux distributions to boot Linux is called LILO. LILO can also be used to boot other operating

systems as well such as MS-DOS, Windows 98 and Windows NT. LILO can be installed to

either the MBR or to the boot record of the Linux root partition.

Install to the boot record instead of the MBR to use another boot loader for another OS which does not know how to boot Linux itself.

For example, say that you want to have both Windows NT and Linux on the same box and you want to dual boot between them. You can have NT’s boot loader installed to the MBR and add the option to boot Linux to it’s boot menu. If the user elects to boot Linux,

NTLOADER will then pass control to LILO which will in turn load the Linux kernel.

FreeBSD has something similar to LILO for loading it’s kernel. It consists of two parts: one which lives in the MBR (see man boot0cfg) and another part which lives in the FreeBSD root partition (see man disklabel).

UNIX and UNIX-like systems for non-PC hardware typically follow a straightforward (but usually proprietary and system specific) scheme for booting their kernels.The kernel itself is a program that usually lives in the root partition of the UNIX filesystem. Most Linux distributions call it “/vmlinuz” and it often a symbolic link to the real kernel file which lives in “/boot”. Other UNIX and UNIX-like systems may call it “/unix”, “/vmunix”, or “/kernel”.After the kernel is brought in from disk into main memory, it begins execution and one of the first things it does is initialize the system’s hardware.All those cryptic messages you see fly by when the Linux kernel first starts up are messages from the compiled-in kernel drivers initializing and configuring your hardware.

Other UNIX and UNIX-like systems do something similar. Sometimes the kernel needs help in configuring your hardware. Information such as IRQ, DMA, and I/O base addresses need to be specified to the kernel. With Linux these can be specified via it’s “command line”.

The BootPrompt-HOWTO has more information about the Linux command line. This can be had from http://www.linuxdoc.org.

The first program the kernel attempts to execute after basic system initialization is complete is called init.The init process is the mother of all processes running on a UNIX system. If this process dies, so does the system.init’s job after basic system initialization is complete is to take over the system start-up procedure and complete the system bootstrap process.

The actual program which the Linux kernel executes as the init process can be specified via the “init” command line parameter. For example, to start bash instead of init, you can specify “init=/bin/bash” on the Linux command line. (see BootPrompt-HOWTO for details.)

Startup Scripts – System V Style::

All start-up scripts are typically kept in a directory named init.d which usually lives somewhere under “/etc”.

Red Hat Linux places this directory under “/etc/rc.d”.HP-UX places this directory under “/sbin”.

Each start-up script can usually accept at least two command line arguments: start and stop.

start tells the script to start whatever it is that script is responsible for.stop tells the script to stop whatever it is that script is responsible for.

http://www.linuxdoc.org/

All start-up scripts are typically kept in a directory named init.d which usually lives somewhere under “/etc”.Red Hat Linux places this directory under “/etc/rc.d”.HP-UX places this directory under “/sbin”.Each start-up script can usually accept at least two command line arguments: start and stop.start tells the script to start whatever it is that script is responsible for.stop tells the script to stop whatever it is that script is responsible for.

Each run-level gets its own directory and is usually under “/etc”, but sometimes can be found under “/sbin” on some systems. This directory follows the naming convention of rcn.d where n is the run-level, i.e. scripts for run-level 2 would be found under a directory named rc2.d.This directory contains scripts which are executed when that run-level is entered.

While this directory can contain actual scripts, it usually consists of symbolic links to real scripts which lives under the init.d directory.

Scripts in the run-level directory are executed in alphanumeric order and if the script name begins with a “S” the script is passed the “start” command line parameter and if it begins with a “K” it is passed the “stop” command line parameter.SysV init’s configuration file is “/etc/inittab”. This file tells init what script it should run for each run-level.

A common way to implement the SysV style start-up procedure is to have init execute some master control script passing to it as an argument the run-level number. This script then executes all of the scripts in that run-level’s script directory. For example, for run-level 2, init may execute the script “/etc/init.d/rc” passing it the argument “2”. This script in turn would execute every script in run-level 2’s script directory “/etc/rc2.d”.

SINGLE USER MODE:

Single-user mode is a special administrative mode that usually starts the system with a minimal configuration. For example, no system daemons are started and extra filesystems may not be mounted.

Single-user mode is typically used to repair a broken system such as fscking a sick filesystem which cannot be repaired by the automatic fscking procedure.

Entering single-user mode varies from system to system, but it usually involves specifying to init a special flag before the system starts up.

This can be done in Linux by specifying the parameter “single” on the LILO boot prompt.

On SysV-ish systems, single user mode can also be entered by telling init to enter run-level 1 or S. This can be done via the telinit command.

SYSTEM SHUT DOWN

UNIX systems have to be gracefully powered down. You cannot just shut the system off. This can damage the system.

The typical way to shutdown the UNIX system is to use the shutdown command. shutdown allows the system administrator to broadcast a message to all currently logged in users that the system is about to be shutdown. The exact syntax of the shutdown command tends to vary from system to system. Check shutdown’s man page for details.

Operating Systems and Basics

OS is system software, which may be viewed as collection of software consisting of procedures for operating the computer & providing an environment for execution of programs. It’s an interface between user & computer.

Types of Processing:

1. Serial Processing2. Batch Processing3. Multiprogramming.

Types of OSs:

1. Batch OS2. Multiprogramming OS

Ø Multitasking/MultiprocessingØ Multiuser OSØ Time Sharing OSØ Real Time OS

3. Network OS4. Distributed OS

OS Structure:

1. Layered Structure

2. Kernel Structure

Ø Create & Delete process

Ø Processor scheduling, mem mgmt & I/O mgmt.

Ø Process synchronization.

Ø IPC help

3. Virtual Machine

4. Client Server model

Process Management

Process Status: New, ready to run, running, suspended, sleep, wait, terminate.

Types of Scheduler:

1. Long term/Job Scheduler2. Medium term/3. Short term/ CPU SchedulerProcesses can be either CPU bound or I/O bound.

Scheduling performance criteria:

CPU utilisation

ThroughputTurnaround timeWaiting timeResponse Time

Scheduling Algorithms:

PreemptiveFirst-come-first served

Non-Preemptive

Shortest-job-firstRound Robin.Priority based schedulingMulti-level Queue

Processing the interrupt to switch the CPU to another process requires saving all the registers for the old process & then loading the registers for new process is known as Context Switching.

Scheduling Mechanisms

A multiprogramming operating system allows more than one process to be loaded into the executabel memory at a time and for the loaded process to share the CPU using time-multiplexing.Part of the reason for using multiprogramming is that the operating system itself is implemented as one or more processes, so there must be a way for the operating system and application processes to share the CPU. Another

main reason is the need for processes to perform I/O operations in the normal course of computation. Since I/O operations ordinarily require orders of magnitude more time to complete than do CPU instructions, multiprograming systems allocate the CPU to another process whenever a process invokes an I/O operation

Goals for Scheduling

Make sure your scheduling strategy is good enough with the following criteria:

Utilization/Efficiency: keep the CPU busy 100% of the time with useful work Throughput: maximize the number of jobs processed per hour. Turnaround time: from the time of submission to the time of completion, minimize the time batch

users must wait for output Waiting time: Sum of times spent in ready queue - Minimize this Response Time: time from submission till the first response is produced, minimize response time

for interactive users Fairness: make sure each process gets a fair share of the CPU

Context Switching

Typically there are several tasks to perform in a computer system.

So if one task requires some I/O operation, you want to initiate the I/O operation and go on to the next task. You will come back to it later.

This act of switching from one process to another is called a "Context Switch"

When you return back to a process, you should resume where you left off. For all practical purposes, this process should never know there was a switch, and it should look like this was the only process in the system.

To implement this, on a context switch, you have to

save the context of the current process select the next process to run restore the context of this new process.

What is the context of a process?

Program Counter Stack Pointer Registers Code + Data + Stack (also called Address Space) Other state information maintained by the OS for the process (open files, scheduling info, I/O

devices being used etc.)

All this information is usually stored in a structure called Process Control Block (PCB).

All the above has to be saved and restored.

What does a context_switch() routine look like?context_switch(){ Push registers onto stack Save ptrs to code and data. Save stack pointer

Pick next process to execute

Restore stack ptr of that process /* You have now switched the stack */ Restore ptrs to code and data. Pop registers Return}

Non-Preemptive Vs Preemptive Scheduling

Non-Preemptive: Non-preemptive algorithms are designed so that once a process enters the running state(is allowed a process), it is not removed from the processor until it has completed its service time ( or it explicitly yields the processor).

context_switch() is called only when the process terminates or blocks.

Preemptive: Preemptive algorithms are driven by the notion of prioritized computation. The process with the highest priority should always be the one currently using the processor. If a process is currently using the processor and a new process with a higher priority enters, the ready list, the process on the processor should be removed and returned to the ready list until it is once again the highest-priority process in the system.

context_switch() is called even when the process is running usually done via a timer interrupt.

First In First Out (FIFO)This is a Non-Premptive scheduling algorithm. FIFO strategy assigns priority to processes in the order in which they request the processor.The process that requests the CPU first is allocated the CPU first.When a process comes in, add its PCB to the tail of ready queue. When running process terminates, dequeue the process (PCB) at head of ready queue and run it.

Consider the example with P1=24, P2=3, P3=3

Gantt Chart for FCFS : 0 - 24 P1 , 25 - 27 P2 , 28 - 30 P3

Turnaround time for P1 = 24 Turnaround time for P1 = 24 + 3 Turnaround time for P1 = 24 + 3 + 3

Average Turnaround time = (24*3 + 3*2 + 3*1) / 3

In general we have (n*a + (n-1)*b + ....) / n

If we want to minimize this, a should be the smallest, followed by b and so on.

Comments: While the FIFO algorithm is easy to implement, it ignores the service time request and all other criteria that may influence the performance with respect to turnaround or waiting time.

Problem: One Process can monopolize CPU

Solution: Limit the amount of time a process can run without a context switch. This time is called a time slice.

Round Robin

Round Robin calls for the distribution of the processing time equitably among all processes requesting the processor.Run process for one time slice, then move to back of queue. Each process gets equal share of the CPU. Most systems use some variant of this.

Choosing Time Slice

What happens if the time slice isnt chosen carefully?

For example, consider two processes, one doing 1 ms computation followed by 10 ms I/O, the other doing all computation. Suppose we use 20 ms time slice and round-robin scheduling: I/O process runs at 11/21 speed, I/O devices are only utilized 10/21 of time.

Suppose we use 1 ms time slice: then compute-bound process gets interrupted 9 times unnecessarily before I/O-bound process is runnable

Problem: Round robin assumes that all processes are equally important; each receives an equal portion of the CPU. This sometimes produces bad results. Consider three processes that start at the same time and each requires three time slices to finish. Using FIFO how long does it take the average job to complete (what is the average response time)? How about using round robin?

* Process A finishes after 3 slices, B 6, and C 9. The average is (3+6+9)/3 = 6 slices.

* Process A finishes after 7 slices, B 8, and C 9, so the average is (7+8+9)/3 = 8 slices.

Round Robin is fair, but uniformly enefficient.

Solution: Introduce priority based scheduling.

Priority Based Scheduling

Run highest-priority processes first, use round-robin among processes of equal priority. Re-insert process in run queue behind all processes of greater or equal priority.

Allows CPU to be given preferentially to important processes. Scheduler adjusts dispatcher priorities to achieve the desired overall priorities for the processes,

e.g. one process gets 90% of the CPU.

Comments: In priority scheduling, processes are allocated to the CPU on the basis of an externally assigned priority. The key to the performance of priority scheduling is in choosing priorities for the processes.

Problem: Priority scheduling may cause low-priority processes to starve

Solution: (AGING) This starvation can be compensated for if the priorities are internally computed. Suppose one parameter in the priority assignment function is the amount of time the process has been waiting. The longer a process waits, the higher its priority becomes. This strategy tends to eliminate the starvation problem.

Shortest Job First

Maintain the Ready queue in order of increasing job lengths. When a job comes in, insert it in the ready queue based on its length. When current process is done, pick the one at the head of the queue and run it.

This is provably the most optimal in terms of turnaround/response time.

But, how do we find the length of a job?

Make an estimate based on the past behavior.

Say the estimated time (burst) for a process is E0, suppose the actual time is measured to be T0.

Update the estimate by taking a weighted sum of these two ie. E1 = aT0 + (1-a)E0

in general, E(n+1) = aTn + (1-a)En (Exponential average)

if a=0, recent history no weightage if a=1, past history no weightage.

typically a=1/2.

E(n+1) = aTn + (1-a)aTn-1 + (1-a)^jatn-j + ...

Older information has less weightage

Comments: SJF is proven optimal only when all jobs are available simultaneously.

Problem: SJF minimizes the average wait time because it services small processes before it services large ones. While it minimizes average wiat time, it may penalize processes with high service time requests. If the ready list is saturated, then processes with large service times tend to be left in the ready list while small processes receive service. In extreme case, where the system has little idle time, processes with large service times will never be served. This total starvation of large processes may be a serious liability of this algorithm.

Solution: Multi-Level Feedback Queques

Multi-Level Feedback Queue

Several queues arranged in some priority order.

Each queue could have a different scheduling discipline/ time quantum.

Lower quanta for higher priorities generally.

Defined by:

# of queues scheduling algo for each queue when to upgrade a priority when to demote

Attacks both efficiency and response time problems.

Give newly runnable process a high priority and a very short time slice. If process uses up the time slice without blocking then decrease priority by 1 and double its next time slice.

Often implemented by having a separate queue for each priority. How are priorities raised? By 1 if it doesn't use time slice? What happens to a process that does a

lot of computation when it starts, then waits for user input? Need to boost priority a lot, quickly.

Swapping

The early deveIopment of UMX systems transferred entire processes between primary memory and secondary storage device but did not transfer parts of a process independently, except for shared text. Such a memory management policy is called swapping. UNIXwas fist implemented on PDP-11, where the total physical memory was limited to 256Kbytes. The total memory resources were insufficient to justify or support complex memory management algorithms. Thus, UNIXswapped entire process memory images. Allocation of both main memory and swap space is done first- fit. When the size of a process' memory image increases (due to either stack expansion or data expansion), a new piece of memory .big enough for the whole image is allocated. The memory image is copied, the old memory is freed, and the appropriate tables are updated. (An attempt is made in some systems to find memory contiguous to the end of the current piece, to avoid some copying.) If no single piece of main memory is large enough, the process is swapped out such that it will be swapped back in with the new size. There is no need to swap out a sharable text segment, because it is read-only, and there is no need to read in a sharable text segment for a process when another instance is already in memory. That is one of the main reasons for keeping track of sharable text segments: less swap traffic. The other reason is the reduced amount of main memory required for multiple processes using the same text segment. Decisions regarding which processes to swap in or swap out are made by the scheduler process (also known as the swapper). The scheduler wakes up at lcast once every 4 seconds to check for processes to be swapped in or out. A process is more likely to be swapped out if it is idle or has been in main memory for a long time, or is large; if no obvious candidates are found, other processes are picked by age. A process is more likely to be swapped in if its has been swapped out a long time, or is small. There are checks to prevent thrashing, basically by not letting a process be swapped out if it's not been in memory for a certain amount of time. If jobs do not need to be swapped out, the process table is searched for a process deserving to be brought in (determined by how small the process is and how long it has been swapped out). Processes are swapped out until there is not enough memory available. Many UNIX systems still use the swapping scheme just described. All Berkeley UNIX systems, on the other hand,

depend primarily on paging for memory-contention management, .and depend only secondarily on

swapping. A scheme similar in outline to the traditional one is used to determine which processes get

swapped in or out, but the details differ and the 1influence of swapping is less.

Demand Paging

As there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program. For example, a database program may be run to query a database. In this case not all of the database needs to

be loaded into memory, just those data records that are being examined. Also, if the database query is a search query then the it does not make sense to load the code from the database program that deals with adding new records. This technique of only loading virtual pages into memory as they are accessed is known as demand paging.

When a process attempts to access a virtual address that is not currently in memory the CPU cannot find a page table entry for the virtual page referenced. For example, in Figure there is no entry in Process X's page table for virtual PFN 2 and so if Process X attempts to read from an address within virtual PFN 2 the CPU cannot translate the address into a physical one. At this point the CPU cannot cope and needs the operating system to fix things up. It notifies the operating system that a page fault has occurred and the operating system makes the process wait whilst it fixes things up. The CPU must bring the appropriate page into memory from the image on disk. Disk access takes a long time, relatively speaking, and so the process must wait quite a while until the page has been fetched. If there are other processes that could run then the operating system will select one of them to run. The fetched page is written into a free physical page frame and an entry for the virtual PFN is added to the processes page table. The process is then restarted at the point where the memory fault occurred. This time the virtual memory access is made, the CPU can make the address translation and so the process continues to run. This is known as demand paging and occurs when the system is busy but also when an image is first loaded into memory. This mechanism means that a process can execute an image that only partially resides in physical memory at any one time.

Synchronization & IPC

The shared storage may be in main memory or it may be a shared file. Each process has a segment of code, critical Section, which accessed shared memory or files. Some way of making sure that if one process is executing in its critical section, other process will be excluded from doing the same thing is known as Mutual Exclusion. Hardware support is available for mutual exclusion called “Test & set instruction”, it is designed to allow only one process among several concurrent processes to enter in its critical section.

Semaphore: It’s a synchronization tool, it’s a variable which accepts non-negative integer values and except for initialization may be accessed and manipulated through two primitive functions wait() & signal().

Disadvantages :

1. Semaphores are unstructured.2. Semaphores do not support data abstraction.

Alternative to Semaphores:

http://sunsite.nus.sg/LDP/LDP/tlk/node25.html#abstractmmmodel

http://sunsite.nus.sg/LDP/LDP/tlk/footnode.html#583

1. Critical region2. Conditional critical region3. Monitors 4. Message Passing

Reason for Deadlock:

1. Mutual exclusion2. Hold & wait3. No preemption4. Circular condition.

Memory Management

In a single process, system memory is protected through hardware mechanism such as dedicated register called Fence register.

In a multi programming, memory can be allocated either Statically or Dynamically. Partition information is stored in partition Description Table. Two strategies are used to allocate memory to ready process are First Fit & Best Fit.

Loading program into memory by relocating load or linker in a static allocation is known as Static relocation. In Dynamic method, run time mapping of virtual address into physical address with support of some hardware mechanism such as base register & limit registers. Protection is served by using Limit registers to restrict the program to access memory location, sharing is achieved by using dedicated common partition.

Static allocation does not support data structures like stack & queues. It limits degree of multi programming.

Compaction is a process of collecting free space in to a single large memory chunk to fit the available process. It is not done, bcoz it occupies lot of CPU time. It is only supported in Mianframe & SuperComputers

Paging is a memory management technique that permits a program’s memory to be non-contiguous into physical memory, thus allowing a program to be allocated physical memory whenever is required. This is done by Virtual Address, later these address are converted to physical address.

Memory is divided into number of fixed size blocks called Frames. The virtual address space or logical memory of a process is also broken into blocks of the same size called pages. When a program is to be run, its pages are loaded into any frame from the disk. Mapping is done thru Page Map Table which contains the base address of each page in physical memory. Hardware support is given to paging using Page Map Table Register (PMRT) which will be pointing to beginning of the PMT. Look side memory or Content addressable memory is used to overcome the problem of PMT.

Address translation is done by Associative Memory which will convert virtual to physical address by page & offset values by looking into PMT.

Segmentation is Memory management scheme its sophisticated form Address translation. It is done by Segment table, which is a important component in segmented system. Segment accessing supported by Segment Table Base register (STBR). Protection is enforced by Segment table Limit register (STLR).

Virtual memory is memory management technique which splits the process into small chunks called Overlay. First overlay will can next overlay before quitting the CPU, the remaining overlays will be on Hard disk, the swapping is done by OS. Advantages:

1. What ever the size program, memory can be allocated easily.2. Since the swapping is done between main & secondary memory, the CPU utilization & throughput will be increased.3. It reduces the external fragmentation & Increases the program execution speed.

In Demand paging, pages are loaded only on demand, not in advance, its same as paging with swapping feature.

Page fault occurs due to missing of page in the main memory, it means that the program is referring the address of the page which is not brought into memory.

File Management

Some systems support a single uniform set of file manipulation features for both file & I/O device management, this feature is known as Device Independent I/O or Device Independence. Printer is a one of a such example.

File organization may be

1. Byte Sequenced in which OS does not impose any structure on the file organization.

2. Record Sequenced, it’s a sequence of fixed sized records, arbitrary records read or written, but records can’t be inserted or deleted in middle of the file.

3. ISAM files are inserted in disc blocks which will have keys to inserted, its look like a tree of blocks.,

Responsibility of File Management.

1. Mapping of logical file address to physical disk address2. Management of disk space & allocation - deallocation.3. Keeping track of all files in system4. Support for protection & sharing of files.

Method to access the file organized in hierarchical form. Absolute pathname & relative pathname. File & directory searching is done using

1. Linear List organization which takes O(n) comparisons to locate a file.2. Hashing Technique3. Balanced binary tree which takes O(log n) comparisons to locate file, it always provides sorted list files which will increase the efficiency.

Collection of tracks on all surfaces that are at the same distance is called a Cylinder.

Disk Space Management Methods:

1. Linked list

2. Bit Map

Disk allocation Methods

1. Contiguous: It supports both Sequential & Direct Accessing. Allocation is done using First Fit & Best Fit methods.

2. Linked List

Advantages:SimpleNo disk Compaction.

Disadvantages

It doesn’t support direct accessing since blocks are scattered over the disk.Slow direct accessing of any disk blockSpace requirement for pointersReliability

3. Indexed

Uses Index Block to support direct accessing.Problem with same can be solved by Multiple Level indexing, indirect blocks, double indirect block.

Advantages

No external fragmentationEfficient random accessIndexing of free space is done with BitmapCan keep the index of bad blocks.

Disk Scheduling

1. First come first served (FCFS)

2. Shortest Seek Time First

3. Scan Scheduling also called Elevator Algorithm

Setup and Status CommandsCommand Purposelogout end your UNIX sessionpasswd change password by prompting for old and new passwordsstty set terminal optionsTABLE 1. Special Keys and Control CharactersSpecial Key Function/DescriptionDELETE Acts as a rubout or erase key. Pressing DELETE once willbackup and erase one character, allowing you to correct andretype mistakes.BACKSPACE This key is sometimes used as the rubout key instead of theDELETE key. Otherwise, it is mapped as a backspace key, whichgenerates a ^H on the display.CTRL-U Û erases the entire command line. It is also called the line killcharacter.CTRL-W ^W erases the last word on the command line.CTRL-S ^S stops the flow of output on the display.CTRL-Q ^Q resumes the flow of output stopped by CTRL-S.CTRL-C ^C interrupts a command or process in progress and returns tothe command line. This will usually work; if it doesn’t, try typingseveral ^C’s in a row. If it still doesn’t work, try typing ^\, q(for quit), exit, ^D, or ^Z.CTRL-Z ^Z suspends a command or process in progress.CTRL-D ^D generates an end-of-file character. It can be used to terminateinput to a program, or to end a session with a shell.CTRL-\ ^\ quits a program and saves an image of the program in a filecalled core for later debugging.A Selected Command ListIntroduction to the UNIX Operating System on IT Systems 13date display or set the datefinger display information about usersps display information about processesenv display or change current environmentset C shell command to set shell variablesalias C shell command to define command abbreviationshistory C shell command to display recent commandsFile and Directory CommandsCommand Purposecat concatenate and display file(s)more paginator - allows you to browse through a text fileless more versatile paginator than moremv move or rename filescp copy filesrm remove filesls list contents of directorymkdir make a directoryrmdir remove a directorycd change working directory

pwd print working directory namedu summarize disk usagechmod change mode (access permissions) of a file or directoryfile determine the type of filequota -v displays current disk usage for this accountEditing ToolsCommand Purposepico simple text editorvi screen oriented (visual) display editordiff show differences between the contents of filesgrep search a file for a patternsort sort and collate lines of a file (only works on one file at a time)wc count lines, words, and characters in a filelook look up specified words in the system dictionaryawk pattern scanning and processing languagegnuemacs advanced text editorA Selected Command List14 Introduction to the UNIX Operating System on IT SystemsFormatting and Printing CommandsCommand Purposelpq view printer queuelpr send file to printer queue to be printedlprm remove job from printer spooling queueenscript converts text files to POSTSCRIPT format for printinglprloc locations & names of printers, prices per pagepacinfo current billing info for this accountProgram Controls, Pipes, and FiltersCommand PurposeCTRL-C interrupt current process or commandCTRL-D generate end-of-file characterCTRL-S stop flow of output to screenCTRL-Q resume flow of output to screenCTRL-Z suspend current process or commandjobs lists background jobsbg run a current or specified job in the backgroundfg bring the current or specified job to the foreground!! repeat entire last command line!$ repeat last word of last command linesleep suspend execution for an intervalkill terminate a processnice run a command at low priorityrenice alter priority of running process& run process in background when placed at end of command line> redirect the output of a command into a file>> redirect and append the output of a command to the end of a file< redirect a file to the input of a command>& redirect standard output and standard error of a command into a file(C shell only)| pipe the output of one command into anotherOther Tools and ApplicationsCommand Purposepine electronic mailbc desk calculatorman print UNIX manual page to screenelm another electronic mail program

About UNIX FilesIntroduction to the UNIX Operating System on IT Systems 15About UNIX FilesNow that you understand UNIX commands, let’s discuss the objects manipulated by most commands:files. As we said before, all files have a filename, and UNIX imposes few restrictions on filenames. Thismakes it easy for you to name your files so that you can easily recognize their contents. You will find ituseful to adopt names and classes of names that indicate how important each file is and what connectionit has with other files. For example, temporary files used to test commands and options could all beginwith a “t.” A filename can be up to 256 characters long, consisting of any alphanumeric character on thekeyboard except the “/”. In general, you should keep your filenames relatively short (to reduce typingeffort) and use normal lower-case characters such as letters, numbers, periods and underscores. Forinstance, if your program calculates employee paychecks, you might call it payroll, or if your file is aresearch paper on Frank Lloyd Wright, you might call it wright. Do not include blanks in your filenamesas they will make it difficult for you to work with the file. If you do wish to separate letters in a filename,use the underscore (“_”) character (as in wright_paper) or the hyphen (“-”) character.Remember that UNIX is case sensitive, which means it recognizes the difference between upper-caseand lower-case letters. For instance, Wright and wright would refer to two different files.When you place a single period in the middle of a filename, the part after the period is commonlyreferred to as an extension or suffix and usually indicates what type of information is stored in the file.You may use any extension desired; a text file might have the extension .txt or .text; a note may havethe extension .note, and so forth. UNIX does not require extensions, but they can be used to help identifysimilar types of files. Since some UNIX programs (especially compilers) look for certain standardextensions, it is common practice to use the following conventions: .h for header files, .c for C sourcefiles, .f for FORTRAN, .p for Pascal, and .s for assembler source files. So the file wright.txt indicates atext file whereas the file payroll.c indicates a C program called payroll. For more information on programmingconventions, see the section, Additional Resources.Some UNIX files begin with a period, for example, .cshrc or .login. Files that begin with a period willnot appear in a normal directory listing and are usually UNIX environment and application setup files.A large grouping of files and directories is referred to as a file system. File systems are related to the disksize and structure, and to the internal structure of UNIX. What you should remember is that users’ filesand directories are usually on a different file system than the system’s files and directories. If the numberof users is large, as on Owlnet, the user files and directories may be on more than one file system.

Creating FilesMany files are created using a text editor. A text editor is a program that allows you to enter and savetext. You can also use a text editor to manipulate saved text through corrections, deletions, or insertions.The main text editors on Information Technology managed networks are vi, GNU Emacs, Pico, andaXe. (Note: vi is included with every UNIX system, but GNU Emacs is commonly installed separatelyby system managers. aXe is only available if you are using the X Window system.) You should learnhow to use at least one of these tools. Information Technology has tutorial documents on each of theseeditors. Please see the section, Additional Resources, for information on the tutorials.You can create a file without a text editor by using the cat command (short for concatenate) and the “>”(redirect output) symbol. To create a file using the cat command, type:cat > new-filenameDisplaying Files16 Introduction to the UNIX Operating System on IT Systemswhere new-filename is the name you wish to give the file. The command cat generally reads in afile and displays it to standard output. When there is no filename directly following the command,cat treats standard input as a file. The “>” symbol will redirect the output from cat into the new filenameyou specify. cat will keep reading and writing each line you type until it encounters an endof-file character. By typing CTRL-D on a line by itself, you generate an end-of-file character. It willstop when it sees this character. Try it, using this example as a guide:cat > practiceWhen you reach the end of each line, press the RETURN key. You can only correct mistakes on theline you are currently typing. Use the DELETE key to move the cursor back to the mistake and then

retype the rest of the line correctly. When you have completed the last line, press RETURN andtype CTRL-D.

Displaying FilesNow that you have created a file, you can display it one of several ways. You could use the cat command.Just type cat followed by the name of the file that you want to see.cat practiceSometimes the files you want to view are very long. When using the cat command, the text willscroll by very quickly. You can control the flow of text by using CTRL-S and CTRL-Q. CTRL-Sstops the flow of text and CTRL-Q restarts it. If you use CTRL-S, stopping the flow of text, and soon, you must remember to type CTRL-Q or the computer will not display any output, includinganything that you type.more is a program that displays only one screen of information at a time; it waits for you to tell it tocontinue. Type more followed by a filename. more practice The computer will display one screen of text and then wait for you to press the space bar before it displays the next page of text, until you reach the end of the file. Pressing the “?” character will show help for more. A utility of greater power called less is available on many systems; it allows reverse scrolling of files and other enhancements. It is invoked the same way as more.

Listing FilesThe ls command will list the files in the current directory that do not begin with a period.Below is a list of options you can tack on to ls:ls -a lists all the contents of the current directory, including files with initialperiods, which are not usually listed.ls -l lists the contents of the current directory in long format, including filepermissions, size, and date information.ls -s lists contents and file sizes in kilobytes of the current directory.Copying FilesIntroduction to the UNIX Operating System on IT Systems 17If you have many files, your directory list might be longer than one screen. You can use the programsmore or most with the “|” (vertical bar or pipe) symbol to pipe the directory list generated as output bythe ls command into the more program. more or less will display the output from ls one page at a time.ls | more

Copying FilesTo make a copy of a file, use the cp (copy) command.cp filename newfilenamewhere filename is the file you wish to copy and newfilename is the file you are creating.cp practice sample (make a copy of “practice” called “sample”)lspractice sampleThe example created a new file called sample that has the same contents as practice. If sample alreadyexists, the cp command will overwrite the previous contents. New accounts are often set up so that cpwill prompt for confirmation before it overwrites an existing file. If your account is not set up in thismanner, use the -i option (cp -i) to get the confirmation prompt, like so:cp -i practice sample

Renaming FilesTo rename one of your files, use the mv (move) command.mv oldfilename newfilenamewhere oldfilename is the original filename and newfilename is the new filename. For instance, to renamesample as workfile type:mv sample workfilelspractice workfileThis moves the contents of sample into the new file workfile. (Note: Moving a file into an existing fileoverwrites the data in the existing file.) New accounts are often set up so that mv will prompt for confirmation

before doing this. If your account is not set up in this manner, use the -i option (mv -i) to get theconfirmation prompt.Deleting Files18 Introduction to the UNIX Operating System on IT Systems

Deleting FilesTo delete files, use the rm (remove) command. For instance, to delete workfile, type:rm workfilelspractice

Creating Links Between FilesYou can refer to one particular file by different names in different directories. The ln command createsa link, which “points” to the file. Note that links are simply alternative names for a single file;ln does not rename the file (as does mv) nor does it make a copy of the file (as does cp). It allowsyou to access the file from multiple directories. Since only one copy of the file actually exists, anychanges that you make through one of its links will be reflected when you access it through anotherof its links, yet if you delete the link, you do not delete what it points to.Links are useful for cross-referencing files. If you know that you will need to access a file from differentdirectories, creating links is a better alternative to making a copy of the file for each directory(and then having to alter each one every time a change is made to the original). It is also more convenientthan having to use the file’s full pathname every time you need to access it. Another use forlinking a file is to allow another user access to that particular file without also allowing entry intothe directory that actually contains the file. The kind of link you will want to create is called a symboliclink. A symbolic link contains the pathname of the file you wish to create a link to. Symboliclinks can tie into any file in the file structure; they are not limited to files within a file system. Symboliclinks may also refer to directories as well as individual files. To create a symbolic link to a filewithin the same directory, type:ln -s originalFile linkNamewhere originalFile is the file that you want to link to and linkName is the link to that file. To create a link in a directory other than that of the original file, type:ln -s originalFile differentDirectoryName/linkName If you create a link within the same directory as the original file, you cannot give it the same name as the original file. There is no restriction on a file’s additional names outside of its own directory. Links do not change anything about a file, no matter what the link is named. If someone makes a link to one of your files, and you then delete that file, that link will no longer point to anything andmay cause problems for the other user.NOTE: You should always use symbolic links when linking to files owned by others!

Printing FilesTo print a file, use the lpr command:

lpr filename orlpr [-Pprintername] filename (for laser printers only)To get a list of the printers available to your machine, type:lprloclprloc lists all of the printers that your system knows about, by name, along with their type and location.To get some status information on the printers, use the command lpstat -p. Printer accounting informationis available by running the command pacinfo.

DirectoriesAbout UNIX DirectoriesUNIX directories are similar to regular files; they both have names and both contain information. Directories,however, contain other files and directories. Many of the same rules and commands that apply tofiles also apply to directories.All files and directories in the UNIX system are stored in a hierarchical tree structure. Envision it as anupside-down tree, as in the figure below.FIGURE 2. UNIX Directory StructureAt the top of the tree is the root directory. Its directory name is simply / (a slash character). Below theroot directory is a set of major subdirectories that usually include bin, dev, etc, lib, pub, tmp, and usr.For example, the /bin directory is a subdirectory, or “child,” of / (the root directory). The root directory,in this case, is also the parent directory of the bin directory. Each path leading down, away from theroot, ends in a file or directory. Other paths can branch out from directories, but not from files.pages on lpq, lpr, andlprm.usr bin libbin libfile1 file2Many directories on a UNIX system have traditional names and traditional contents. For example,directories named bin contain binary files, which are the executable command and applicationfiles. A lib directory contains library files, which are often collections of routines that can beincluded in programs by a compiler. dev contains device files, which are the software componentsof terminals, printers, disks, etc. tmp directories are for temporary storage, such as when a programcreates a file for something and then deletes it when it is done. The etc directory is used for miscellaneousadministrative files and commands. pub is for public files that anyone can use, and usr hastraditionally been reserved for user directories, but on large systems it usually contains other bin,tmp, and lib directories.Your home directory is the directory that you start out from when you first login. It is the top leveldirectory of your account. Your home directory name is almost always the same as your userid.Every directory and file on the system has a path by which it is accessed, starting from the rootdirectory. The path to the directory is called its pathname. You can refer to any point in the directoryhierarchy in two different ways: using its full (or absolute) pathname or its relative pathname.The full pathname traces the absolute position of a file or directory back to the root directory, usingslashes (/) to connect every point in the path. For example, in the figure above, the full pathname offile2 would be /usr/bin/file2. Relative pathnames begin with the current directory (also called theworking directory, the one you are in). If /usr were your current directory, then the relative pathnamefor file2 would be bin/file2.If you are using C shell, TC shell, or the Bourne-Again shell, UNIX provides some abbreviationsfor a few special directories. The character “~” (tilde) refers to your home directory. The homedirectory of any user (including you, if you want) can be abbreviated from /parent-directories/useridto ~userid. Likewise, you can abbreviate /parent-directories/youruserid/file to ~/file. The currentdirectory has the abbreviation . (period). The parent of the current directory uses .. (twoconsecutive periods) as its abbreviation.

Displaying DirectoriesWhen you initially log in, the UNIX system places you in your home directory. The pwd commandwill display the full pathname of the current directory you are in.pwd/home/useridBy typing the ls -a command, you can see every file and directory in the current directory, regardlessof whether it is your home directory. To display the contents of your home directory when it isnot your current directory, enter the ls command followed by the full pathname of your home directory.ls /home/useridIf you are using a shell other than the Bourne shell, instead of typing the full pathname for yourdirectory, you can also use the tilde symbol with the ls command to display the contents of yourhome directory.ls ~

To help you distinguish between files and directories in a listing, the ls command has a-F option, which appends a distinguishing mark to the entry name showing the kind of data it contains:no mark for regular files; “/” for directories; “@” for links; “*” for executable programs:ls -F ~

Changing DirectoriesTo change your current directory to another directory in the directory tree, use the cd command. Forexample, to move from your home directory to your projects directory, type:cd projects (relative pathname from home directory)or,cd ~/projects (full pathname using ~)or,cd /home/userid/projects (full pathname)Using pwd will show you your new current directory.pwd/home/userid/projectsTo get back to the parent directory of projects, you can use the special “..” directory abbreviation.cd ..pwd/home/useridIf you get lost, issuing the cd command without any arguments will place you in your home directory. Itis equivalent to cd ~, but also works in the Bourne shell.

Moving Files Between DirectoriesYou can move a file into another directory using the following syntax for the mv command:mv source-filename destination-directoryFor example,mv sample.txt ~/projectsmoves the file sample.txt into the projects directory. Since the mv command is capable of overwritingfiles, it would be prudent to use the -i option (confirmation prompt). You can also move a file into aanother directory and rename it at the same time by merely specifying the new name after the directorypath, as follows:mv sample.txt ~/projects/newsample.txt

Copying Files to Other DirectoriesAs with the mv command, you can copy files to other directories:cp sample.txt ~/projects

As with mv, the new file will have the same name as the old one unless you change it while copyingit.cp sample.txt ~/projects/newsample.txt

Renaming Directories

You can rename an existing directory with the mv command:mv oldDirectory newDirectoryThe new directory name must not exist before you use the command. The new directory need notbe in the current directory. You can move a directory anywhere within a file system.

Removing Directories

To remove a directory, first be sure that you are in the parent of that directory. Then use the commandrmdir along with the directory’s name. You cannot remove a directory with rmdir unless allthe files and subdirectories contained in it have been erased. This prevents you from accidentallyerasing important subdirectories. You could erase all the files in a directory by first going to thatdirectory (use cd) and then using rm to remove all the files in that directory. The quickest way toremove a directory and all of its files and subdirectories (and their contents) is to use the rm -r (forrecursive) command along with the directory’s name. For example, to empty and remove yourprojects directory, move to that directory’s parent, then type:rm -r projects (remove the directory and its contents)

File and Directory PermissionsIt is important to protect your UNIX files against accidental (or intentional) removal or alterationby yourself or other users. The UNIX operating system maintains information, known as permissions,for every file and directory on the system. This section describes how to inspect and changethese permissions.UNIX was designed and implemented by computer scientists working on operating systemresearch. Many of the fundamentals of UNIX reflect this origin in academia. A low concern forsecurity is one of the hallmarks of UNIX operating systems. Therefore, unless you act to restrictaccess to your files, chances are high that other users can read them.Every file or directory in a UNIX file system has three types of permissions (or protections) thatdefine whether certain actions can be carried out. The permissions are:read ( r ) A user who has read permission for a file may look at its contents or make acopy of it. For a directory, read permission enables a user to find out whatfiles are in that directory.write ( w ) A user who has write permission for a file can alter or remove the contents ofthat file. For a directory, the user can create and delete files in that directory.execute ( x ) A user who has execute permission for a file can cause the contents of thatfile to be executed (provided that it is executable). For a directory, executepermission allows a user to change to that directory.For each file and directory, the read, write, and execute permissions may be set separately for each ofthe following classes of users:User ( u ) The user who owns the file or directory.Group ( g ) Several users purposely lumped together so that they can share access toeach other's files.Others ( o ) The remainder of the authorized users of the system.The primary command that displays information about files and directories is ls. The -l option will displaythe information in a long format. You can get information about a single UNIX file by using ls -lfilename.Each file or subdirectory entry in a directory listing obtained with the -l option consists of seven fields:

permission mode, link count, owner name, group name, file size in bytes, time of last modification, andthe filename (the group name appears only if the “g” flag is also specified, as in ls -lg).The first 10 characters make up the mode field. If the first character is a “d” then the item listed is adirectory; if it is a “-” then the item is a file; if it is an “l” then it is a link to another file. Characters 2through 4 refer to the owner’s permissions, characters 5 through 7 to the group’s permissions (groupsare defined by the system administrator), and the last three to the general public’s permissions. (You cantype id to verify your userid and group membership.) If a particular permission is set, the appropriateletter appears in the corresponding position; otherwise, a dash indicates that the permission is not given.The second field in the output from ls -l is the number of links to the file. In most cases it is one, butother users may make links to your files, thus increasing the link count. A special warning to peopleusing links to other people’s files: your “copies” of their files can be counted against them by the filequota system available on certain UNIX variants. That is why making links other than symbolic links toother people’s files is strongly discouraged. The third field gives the userid of the owner of the file. Thegroup name follows in the fourth field (if the -g option is used in conjunction with -l). The next twofields give the size of the file (in bytes) and the date and time at which the file was last modified. The lastfield gives the name of the file.ls -l myfile-rw-r--r-- 1 owner 588 Jul 15 14:39 myfileA file’s owner can change any or all of the permissions with the chmod (change mode) command. Thechmod command allows you to dictate the type of access permission that you want each file to have. Inthe previous example the current permissions for myfile are read for everybody, write for the owner, andexecute by no one.The arguments supplied to chmod are a symbolic specification of the changes required, followed by oneor more filenames. The specification consists of whose permissions are to be changed: u for user(owner), g for group, o for others, or some combination thereof (a (all) has the same effect as ugo), howthey are to be changed (+ adds a permission, - removes a permission, and = sets the specified permissions,removing the other ones) and which permission to add or remove (r for read, w for write,and x for execute). For example, to remove all the permissions from myfile:chmod a-rwx myfilels -l myfile---------- 1 owner 588 Jul 15 14:41 myfile(Note: chmod a= myfile achieves the same effect.)To allow read and write permissions for all users:chmod ugo+rw myfilels -l myfile-rw-rw-rw- 1 owner 588 Jul 15 14:42 myfileTo remove write permission for your groups and other users:chmod go-w myfilels -l myfile-rw-r--r-- 1 owner 588 Jul 15 14:42 myfileFinally, to allow only read permission to all users:chmod a=r myfilels -l myfile-r--r--r-- 1 owner 58 Jul 15 14:43 myfileNow the file is protected by allowing only read access; it cannot be written to or executed by anyone,including you. Protecting a file against writing by its owner is a safeguard against accidentaloverwriting, although not against accidental deletion.chmod will also accept a permission setting expressed as a 3-digit octal number. To determine thisoctal number, you first write a 1 if the permission is to be set and a 0 otherwise. This produces abinary number which can be converted into octal by grouping the digits in threes and replacingeach group by the corresponding octal digit according to the table below.TABLE 2. Symbolic to Octal ConversionsSymbolic Binary Octal--- 000 0--x 001 1-w- 010 2

-wx 011 3r-- 100 4r-x 101 5Thus, if the setting you want is rw-r--r--, determine the octal number with the following method:This shows that the octal equivalent of rw-r--r-- is 644. The following example illustrates that thepermissions for myfile have been reset to the values with which we began.chmod 644 myfilels -l myfile-rw-r--r-- 1 owner 588 Jul 15 14:44 myfileTo change the permissions back to read only, you can execute chmod as follows:chmod 444 myfilels -l myfile-r--r--r-- 1 owner 588 Jul 15 14:45 myfileAs with files, directories may also have permissions assigned. When listing directories, you may use the-d option to keep from descending into the directories you list. Otherwise, the contents of the directorieswill be displayed as well as their names. Below is an example of permissions assigned to a directory:ls -lgd homedrwxrwxr-x 1 owner caam223 588 Jul 15 9:45 homeThe directory and the files and directories under it may be read and executed by anyone, but written toonly by the owner and users in the caam223 group. Assuming you are the owner of this directory, youmay decide to change the permission to allow only yourself and the caam223 group to read and executefiles in the home directory. You would set the permissions accordingly:chmod o-rx homels -lgd homedrwxrwx--- 1 owner caam223 588Jul 15 9:46 homeYou may decide that only you should be able to alter the contents of the directory. You must remove thewrite permission for the group.rw- 110 6rwx 111 7TABLE 2. Symbolic to Octal ConversionsSymbolic Binary Octalsymbolic r w - r - - r - -\ / \ / \ /binary 110 100 100\ / \ / \ /octal 6 4 4Removing Directories26 Introduction to the UNIX Operating System on IT Systemschmod 750 homels -lgd homedrwxr-x--- 1 owner caam223 588 Jul 15 9:48 homeAn alternative to the previous command is chmod g-w.When you create a file the system gives it a default set of permissions. These are controlled by thesystem administrator and will vary from installation to installation. If you would like to change thedefault which is in effect for you, choose your own with the umask command. Note that the permissionspecified by the umask setting will be applied to the file, unlike that specified in thechmod command, which normally only adds or deletes (few people use the = operator to chmod).First, issue the command without arguments to cause the current settings to be echoed as an octalnumber:umask022If you convert these digits to binary, you will obtain a bit pattern of 1’s and 0’s. A 1 indicates thatthe corresponding permission is to be turned off, a 0, that it is to be turned on. (Notice that the bitpatterns for chmod and umask are reversed.) Hence, the mask output above is 000010010, whichproduces a permission setting of rwr-r- (i.e., write permission is turned off for group and other).

Newly created files always have the execution bit turned off.Suppose you decide that the default setting you prefer is rwxr-x---. This corresponds to themasking bit pattern 000010111, so the required mask is 026:umask 26Now, if you create a new file during this session, the permissions assigned to the file will be theones allowed by the mask value.

Wildcard CharactersUsing wildcard characters that allow you to copy, list, move, remove, etc. items with similar namesis a great help in manipulating files and directories.1. The symbol ? will match any single character in that position in the file name.2. The symbol * will match zero or more characters in the name.3. Characters enclosed in brackets [and] will match any one of the given characters in the givenposition in the name. A consecutive sequence of characters can be designated by [char char].Examples of each follow:1. ?ab2 would match a name that starts with any single character and ends with ab2. ?ab? wouldmatch all names that begin and end with any character and have ab in between.2. ab* would match all names that start with ab, including ab itself.a*b would match all names that start with a and end with b, including ab.3. s[aqz] would match sa, sq, and sz.s[2-7] would match s2, s3, s4, s5, s6 and s7.Viewing Your ProcessesIntroduction to the UNIX Operating System on IT Systems 27These wildcard symbols help in dealing with groups of files, but you should remember that the instruction:rm *would erase all files in your current directory (although by default, you would be prompted to okay eachdeletion). The wildcard * should be used carefully.

ProcessesEvery command or program running under UNIX is called a process. A sequence of related processes iscalled a job. Your applications and even your shell itself are processes. The windowing system is also aprocess, or a collection of processes. The UNIX kernel manages the processes on the system, usuallywithout distinguishing among them. UNIX is a multi-tasking system—it allows you to continue to workin the foreground while running one or more jobs in the background. It also runs the processes of manyusers simultaneously. You could even log off and come back later if the background jobs do not requireinteraction with you.

Viewing Your ProcessesThe command ps will show you the status of your processes.psPID TT STAT TIME COMMAND4804 p3 S 0:00 -sh (csh)1352 p3 R 0:00 ps3874 p7 IW 0:25 xclock -g 90x90-0+03875 p7 S 0:48 xbiff -g 90x90-95+03879 p7 S 0:10 twm3880 p7 IW 0:00 -bin/csh (csh)3892 p9 IW 0:24 /usr/local/bin/elmps displays the process ID, under PID; the control terminal (if any), under TT; the state of the process,under STAT; the cpu time used by the process so far (including both user and system time), underTIME; and finally, an indication of the COMMAND that is running.The state of the process is indicated by a sequence of letters. The man pages for ps explain what the lettersmean if you want to know. For most purposes, you won’t really need to know what the letters mean.Running Background Jobs28 Introduction to the UNIX Operating System on IT Systems

Running Background Jobs

Putting a program into an unattended state where it continues to execute is referred to as putting it(the process or job) into the background. (Running a program on one machine and displaying itsoutput on another via a windowing system like X is not considered backgrounding the job.)Adding an & (ampersand) at the end of the command line instructs UNIX to run the job in thebackground.jobname &The response you receive will be something like this:[1] 5432This particular response means that you have one job running in the background (and its job numberis 1), and its process identification number (PID) is 5432. You will need to know the PID if youwant to abort the job. This is known as killing a job. To kill the job in the above example, you wouldtype:kill 5432You could also usekill %1or, if there’s only one job running called “jobname,”kill %jobnameIn the C shell, the job number can be used to control which jobs run in background or foreground.The job number is used when switching a job that is processing in the foreground to the background,and one that is processing in the background to the foreground. To do the former, first pressCTRL-Z to suspend the job. Then type:bg %jobNumberTo switch the job to the foreground, simply type:fg %jobNumberIf you have forgotten the job number, type the command jobs to see a list of the jobs that are runningin the background at the moment.Note: The rules imposed by system administrators about where and how to run background jobsvaries from network to network and changes over time. It is important to stay current with the backgroundjob policy of your network.

Process Scheduling PriorityThe nice command is used to set the processing priority of a command. The priority of a processdetermines how much attention the system will devote to completing that job. The higher the priority,the more attention a job gets, which implies that it will take less time to complete than the samejob run at a lower priority. There are two versions of nice. In the C shell, the syntax is:nice -priorityNumber command argumentIn the Bourne shell, the syntax is:nice +priorityNumber command argumentThe available priority numbers for users ranges from 1 to 19 with 19 being the lowest priority. In otherwords, the higher the nice value, the lower the processing priority. (Note: It is important to check thenetwork policy for the required nice value for background jobs on your system; they are usuallyrequired to be niced and your job may be downgraded in priority if it was niced at the wrong value.) Setyour command at the required nice value or higher. If you do not include a number argument, the valuewill default to 4 for the C shell and 10 for the Bourne shell.For example, if you wanted to run a long non-interactive job, and you didn’t have to have the results ofthis job right away, you should run it in the background and set a high nice value. Using the C shell, youwould type:nice -19 jobname &

Remote LoginSometimes, while you are logged into one workstation, you will find that you would like to be logged into another workstation, file server, or other UNIX system. The command rlogin allows you to do so providedthat you have an account on the other system. Type:rlogin newSystemYou may then have to supply your password. You should also get the messages about logging in that areused on newSystem. If your userid is different on newSystem you will have to use the form:

rlogin newSystem -l userid

UNIX / Linux Command Summary

access()Used to check the accessibility of files

intAccess(pathname, access_mode)Char* pathname;int access-mode;The access modes are.04 read02 write01 execute (search)00 checks existence of a file

& operatorexecute a command as a background process.

bannerprints the specified string in large letters. Each argument may be upto 10 characters long.

breakis used to break out of a loop. It does not exit from the program.

CalProduces a calender of the current month as standard output. The month (1-12) and year (1-9999) must be specified in full numeric format.

Cal [[ month] year]

CalendarDisplays contents of the calendar file

case operator The case operator is used to validate multiple conditions.

Case $string in

Pattern 1)Command list;;Command list;;

Pattern 3)Command list;;easc

cat(for concatenate) command is used to display the contents of a file. Used without arguments it takes input from standard input <Dtrl d> is used to terminate input.

cat [filename(s)]cat > [filename]Data can be appended to a file using >>

Some of the available options are :Cat [-options] filename(S)-s silent about files thatcannot be accessed-v enables display of non printinging characters (except tabs, new lines, form-

feeds)-t when used with –v, it causes tabs to be printed as Î’s-e when used with –v, it causes $ to be printed at the end of each lineThe –t and –e options are ignored if the –v options is not specified.

cdUsed to change directories

chgrpChanges the group that owns a file.Chgrp [grou –id] [filename]

chmodAllows file permissions to be changed for each user. File permissions can be changed only by the owner (s).Chmod [+/-][rwx] [ugo] [filename]

chownUsed to change the owner of a file.The command takes a file(s) as source files and the login id of another user as the target.Chown [user-id] [filename]

cmpThe cmp command compares two files (text or binary) byte-by-byte and displays the first occurrence where the files differ.Cmp [filename1] [filename2] -1 gives a long listing

comm.The comm command compares two sorted files and displays the instances that are common. The display is separated into 3 columns.Comm. filename1 filename2first displays what occurs in first files but not in the secondsecond displays what occurs in second file but not in firstthird displays what is common in both files

continue statementThe rest of the commands in the loop are ignored. It moves out of the loop and moves on the next cycle.

cpThe cp (copy) command is used to copy a file.Cp [filename1] [filename2]

cpio(copy input/output)Utility program used to take backups.Cpio operates in three modes:-o output-i input-p pass

creat()the system call creates a new file or prepares to rewrite an existing file. The file pointer is set to the beginning of file.#include<sys/tyes.h>#include<sys/stat.h>int creat(path, mode)

char *path;int mode;

cutused to cut out parts of a file. It takes filenames as command line arguments or input from standard input. The command can cut columns as well as fields in a file. It however does not delete the selected parts of the file.Cut [-ef] [column/fie,d] filenameCut-d “:” –f1,2,3 filenameWhere –d indicates a delimiter specified within “:”

dfused to find the number of free blocks available for all the mounted file systems.#/etc/df [filesystem]

diffthe diff command compares text files. It gives an index of all the lines that differ in the two files along with the line numbers. It also displays what needs to be changed.Diff filename1 filename2

echoThe echo command echoes arguments on the command line.echo [arguments]

envDisplays the permanent environment variables associated with a user’s login id

exit commandUsed to stop the execution of a shell script.

expr commandExpr (command) command is used for numeric computation.The operators + (add), -(subtract), *(multiplu), /(divide), (remainder) are allowed. Calculation are performed in order of normal numeric precedence.

findThe find command searches through directories for files that match the specified criteria. It can take full pathnames and relative pathnames on the command line.To display the output on screen the –print option must be specified

for operatorThe for operator may be used in looping constructs where there is repetitive execution of a section of the shell program.For var in vall val2 val3 val4;

Do commnds; done

fsckUsed to check the file system and repair damaged files. The command takes a device name as an argument# /etc/fsck /dev/file-system-to-be-checked.

grave operatorUsed to store the standard the output of a command in an enviroment variable. (‘)

grepThe grep (global regular expression and print) command can be used as a filter to search for strings in files. The pattern may be either a fixed character string or a regular expression.Grep “string” filename(s)

HOMEUser’s home directory

if operatorThe if operator allows conditional operator

If expression; then commands; fiif … then…else… fi$ if; then

commandsefile; then

commandsfi

killused to stop background processes

Inused to link files. A duplicate of a file is created with another name

LOGNAMEdisplays user’s login name

lsLists the files in the current directory

Some of the available options are:-l gives a long listing-a displays all file{including hidden files

lp used to print data on the line printer.Lp [options] filename(s)

mesg The mesg command controls messages received on a terminal.-n does not allow messages to be displayed on screen-y allows messages to be displayed on screen

mkdir used to create directories

moreThe more command is used to dispay data one screenful at a time.More [filename]

mvMv (move) moves a file from one directory to another or simply changes filenames. The command takes filename and pathnames as source names and a filename or exiting directory as target names.mv [source-file] [target-file]

newsThe news command allows a user to read news items published by the system administrator.

niDisplays the contents of a file with line numbers

passwdChanges the password

pasteThe paste command joins lines from two files and displays the output. It can take a number of filenames as command line arguments.paste file1 file2

PATHThe directories that the system searches to find commands

pg Used to display data one page (screenful) at a time. The command can take a number of filenames as arguments.Pg [option] [filename] [filename2]…..

pipeOperator (1) takes the output of one commands as input of another command.

psGives information about all the active processes.

PS1The system prompt

pwd(print working directory) displays the current directory.

rmThe rm (remove) command is used to delete files from a directory. A number of files may be

deleted simultaneously. A file(s) once deleted cannot be retrieved.rm [filename 1] [filename 2]…

sift command Using shift $1becomes the source string and other arguments are shifted. $2 is shifted to $1,$3to $2 and so on.

SleepThe sleep command is used to suspend the execution of a shell script for the specified time. This is usually in seconds.

sortSort is a utility program that can be used to sort text files in numeric or alphabetical orderSort [filename]

splitUsed to split large file into smaller filesSplit-n filenameSplit can take a second filename on the command line.

suUsed to switch to superuser or any other user.

syncUsed to copy data in buffers to files

system0Used to run a UNIX command from within a C program

tailThe tail command may be used to view the end of a file.Tail [filename]

tar Used to save and restore files to tapes or other removable media.Tar [function[modifier]] [filename(s)]

teeoutput that is being redirected to a file can also be viewed on standard output.

test command It compares strings and numeric values.The test command has two forms : test command itself If test ${variable} = value thenDo commands else do commands

FileThe test commands also uses special operators [ ]. These are operators following the of are interpreted by the shell as different from wildcard characters. Of [ -f ${variable} ]

ThenDo commandsElif[ -d ${variable} ]

thendo commands

elsedo commands

fimany different tests are possible for files. Comparing numbers, character strings, values of environment variables.

timeUsed to display the execution time of a program or a command. Time is reported in seconds.Time filename values

trThe tr command is used to translate characters.tr [-option] [string1 [string2]]

tty Displays the terminal pathname

umaskUsed to specify default permissions while creating files.

uniq The uniq command is used to display the uniq(ue) lines in a sorted file.Sort filename uniq

untilThe operator executes the commands within a loop as long as the test condition is false.

wallUsed to send a message to all users logged in.# /etc/wall message

wait the command halts the execution of a script until all child processes, executed as background processes, are completed.

wcThe wc command can be used to count the number of lines, words and characters in a fine.wc [filename(s)]The available options are: wc –[options] [filename]-1-w-cwhile operatorthe while operator repeatedly performs an operation until the test condition proves false.

$ whileØ do

commands Ø done

who displays information about all the users currently logged onto the system. The user name, terminal number and the date and time that each user logged onto the system.The syntax of the who command is who [options]

writeThe write command allows inter-user communication. A user can send messages by addressing the other user’s terminal or login id.write user-name [terminal number]

Logging In and OutGetting the Login PromptBefore you can start using the system you must login to it. The method that you use to login variesdepending on the type of device that you are using to login. Read the section below that is appropriatefor you and then read the section, Entering Your Userid and Password. In order to login, youMUST have an account on the system you are accessing. Remember, different domains require differentaccounts. You cannot login to Owlnet with a RUF account.TTY TerminalIf you are using a TTY terminal (a TTY is line-at-a-time oriented as opposed to page oriented) andthe screen is blank, you only need to press RETURN and a login prompt should appear on thescreen.WorkstationIf the display features a box in the center of the screen with text similar to that in the figure below,then you are using a workstation that is configured to run a windowing system called the X Windowsystem. These machines are called X terminals. (For more information on the X Window system,see the Information Technology document, UNIX 2, Introduction to the X Window System.).If the screen is entirely black, then a screen-saving program is running automatically to protect the

monitor from damage. Moving the mouse or pressing the RETURN key should “wake up” the display.(If you see the words “This screen has been locked...” then someone else is using the workstation,but they are temporarily away from their seat. Look for an unoccupied machine.) Move themouse until the cursor (a black ‘X’) is on top of the white box.

Entering Your Userid and PasswordWhen you applied for your account information, you selected a userid and a password. This combinationof information allows you to access your account. Type your userid using lower-case letters, thenpress the RETURN key. It is very important that you use lower-case letters for your userid. If you makea typing mistake, you can correct it by pressing the DELETE key once for each character you wish toerase. You must make your corrections before you press the RETURN key. If the text you are typingappears in upper-case, see the section, Troubleshooting.After you have entered your userid, the system will prompt you for your password (by displaying theword “Password:” if it is not already on the screen, or by moving the cursor behind the word “Password:”already on your screen). Enter your password and press the RETURN key. Notice that the systemdoes not show or “echo” your password as you type it. This prevents other people from learningyour password by looking at your screen.If you receive a message similar to “Login failed, please try again,” you may have typed your userid orpassword incorrectly. Try again, making sure to type in your userid and password correctly. If you arestill having problems, go to the Consulting Center (713-348-4983, 103 Mudd Labs) and ask for help.When you have successfully logged on, the system will pause for a moment, and then display a fewlines telling you when and from which machine you last logged on, and any messages from the systemadministrator.On X terminals, you will get a window containing system information. After reading it, use the leftmouse button click on either the “Help” or “Go Away” button, depending on what you want. Help putsyou into a help system; Go Away allows you to begin your work.Your new account is provided with a set of command procedures which are executed each time you login. You can change part of your UNIX environment by changing these setup files (accounts on InformationTechnology supported systems are set up to produce a default environment). For further information,check out the Sun manual SunOS User’s Guide: Customizing Your Environment, available from theOperations Center, 109 Mudd Lab.The system will then display the command prompt. The prompt signals that the system is ready for youto enter your next command. The name of the workstation followed by a percent sign (%) forms thecommand prompt (e.g. chub.owlnet.rice.edu%). Once you finish typing a command, you mustalways press RETURN to execute it.

Logging OutWorkstations and TTY TerminalsTo end a work session, you must explicitly log out of a UNIX session. To do this, type logout at thecommand prompt. Once you have logged out, the system will either display the login prompt again orbegin executing a screen saver program.You should never turn a workstation off. Turning off a terminal does not necessarily log you out. If youare having trouble logging out, see the section, Troubleshooting.

X Terminals

To log out of the X Window system from an X terminal, move the cursor into the console window(it is labeled “console”), type the command exit, and press RETURN. If you try to use the logoutcommand in the console window, you will receive the message, “Not in login shell.”

Changing Your PasswordYou can change your password at any time. We recommend that you change it on a regular basis. Atthe command prompt, type passwd.You will be prompted to enter your old password and be askedtwice to enter your new password. Neither your old nor new password will appear on the screen asyou type. In order to be accepted, your password must meet the following conditions:1. It must be at least seven characters long.2. It must not match anything in your UNIX account information, such as your login name, or anitem from your account information data entry.3. It must not be found in the system’s spelling dictionary unless a character other than the first iscapitalized. It must not have three or more consecutively repeated characters or words in thedictionary contained within it.For example, changing your password from Kat899 (based on a dictionary word) to B00z00e (badpassword) will look similar to the following example, except that the keystrokes for you old andnew password will not be echoed on the screen.passwdcurrent password: Kat899New password (? for help): B00z00eNew password (again): B00z00ePassword changed for useridThese are bad examples and will not work, so choose your OWN password. Here is a good techniqueto follow for creating your password:All she wants to do is dance => AswtdidOn many systems, the password change does not take effect immediately, even though you havefinished with the passwd command. It can take upwards of an hour for the system to install the newpassword, due to the scheduling of the password changing process. Thus you should be prepared touse your old password to login again shortly after changing it.If you should ever forget your password, you can go to the Information Desk in 103 Mudd Lab andrequest that a new password be generated for you. You will need to bring your Rice ID card withyou to identify yourself.

UNIX CommandsThe UNIX ShellOnce you are logged in, you are ready to start using UNIX. As mentioned earlier, you interact with thesystem through a command interpreter program called the shell. Most UNIX systems have two differentshells, although you will only use one or the other almost all of the time. The shell you will find onInformation Technology supported networks is the C shell. It is called the C shell because it has syntaxand constructs similar to those in the C programming language. The C shell command prompt oftenincludes the name of the computer that you are using and usually ends with a special character, mostoften the percent sign (%). Another common shell is the Bourne shell, named for its author. The defaultprompt for the Bourne shell is the dollar sign ($). (If the prompt is neither one of these, a quick way tocheck which shell you are using is to type the C shell command alias; if a list appears, then you areusing the C shell; if the message, “Command not found” appears, then you are using the Bourne shell).Modified versions of these shells are also available. TC shell (tcsh) is C shell with file name completionand command line editing (default prompt: >). The GNU Bourne-Again shell (bash) is basically theBourne shell with the same features added (default prompt: bash$).In addition to processing your command requests, UNIX shells have their own syntax and control constructs.You can use these shell commands to make your processing more efficient, or to automate repetitivetasks. You can even store a sequence of shell commands in a file, called a shell script, and run it justlike an ordinary program. Writing shell scripts is a topic discussed in the class notes for the UNIX IIIScriptsShort Course.

About UNIX Commands

UNIX has a wide range of commands that allow you to manipulate not only your files and data, but alsoyour environment. This section explains the general syntax of UNIX commands to get you started.A UNIX command line consists of the name of the UNIX command followed by its arguments (options,filenames and/or other expressions) and ends with a RETURN. In function, UNIX commands are similarto verbs in English. The option flags act like adverbs by modifying the action of the command, andfilenames and expressions act like objects of the verb. The general syntax for a UNIX command is:command -flag options file/expressionThe brackets around the flags and options are a shorthand way to indicate that they are often optional,and only need to be invoked when you want to use that option. Also, flags need not always be specifiedseparately, each with their own preceding dash. Many times, the flags can be listed one after the otherafter a single dash. Some examples later on will illustrate this concept.You should follow several rules with UNIX commands:1. UNIX commands are case-sensitive, but most are lowercase.2. UNIX commands can only be entered at the shell prompt.3. UNIX command lines must end with a RETURN.4. UNIX options often begin with a “-” (minus sign).5. More than one option can be included with many commands.Redirecting Input and Output

UNIX maintains a couple of conventions regarding where input to a program or command comesfrom and output from that program or command goes. In UNIX, the standard input is normally thekeyboard, and the standard output is normally the screen. UNIX is very flexible, and it allows youto change or redirect where the input comes from and where the output goes. First, any commandthat would normally give results on the screen can be directed instead to send the output to a filewith the “>” (output redirection) symbol. Thus,date > filedirects the system to put the output from the date command, which merely reports the time anddate as the system knows it, into the file named file rather than printing it to your screen. One thingto keep in mind about “>” is that each successive redirection to a particular file will overwrite all ofthe previously existing data in that file. To append to the end of a file, use “>>” instead. For example,data >> fileAnother redirection is “<”, which tells the command to take its input from a file rather than fromthe keyboard. For example, if you have a program that requires data input from the keyboard, youmay find that you have to type the same data a large number of times in the debugging stage of programdevelopment. If you put that data in a file and direct the command to read it from there youwill only have to type the data once, when you make the data file.program < datafileIf you do this, you would see the same response from program as if you had typed the data in fromthe keyboard when requested.You can also combine both kinds of redirection as in,program < datafile > outputfileThe data in the file datafile will then be used as input for program and all output will be stored inoutputfile.If you want to accumulate output from different sources in a single file, the symbol “>>” directsoutput to be appended to the end of a file rather than replacing the previous (if any) contents, whichthe single “>” redirection will do.A final I/O redirection is the pipe symbol, “|.” The “|” tells the computer to take the output createdby the command to the left of it and use that as the input for the command on the right. For example,we could type:date | programThis would use the output of the date command as input to another program.

NOTE: Many, but not all, interactive programs accept input from a file.

Why shell programming?

Even though there are various graphical interfaces available for Linux the shell still is a very neat tool. The shell is not just a collection of commands but a really good programming language.You can automate a lot of tasks with it, the shell is very good for system administration tasks, you can very quickly try out if your ideas work which makes it very useful for simple prototyping and it is very useful for small utilities that perform some relatively simple tasks where efficiency is less important than ease of configuration, maintenance and portability.So let's see now how it works:

Creating a script

There are a lot of different shells available for Linux but usually the bash (bourne again shell) is used for shell programming as it is available for free and is easy to use. So all the scripts we will write in this article use the bash (but will most of the time also run with its older sister, the bourne shell).For writing our shell programs we use any kind of text editor, e.g. nedit, kedit, emacs, vi...as with other programming languages.The program must start with the following line (it must be the first line in the file): #!/bin/sh

The #! characters tell the system that the first argument that follows on the line is the program to be used to execute this file. In this case /bin/sh is shell we use.When you have written your script and saved it you have to make it executable to be able to use it.To make a script executable typechmod +x filename Then you can start your script by typing: ./filename

Comments

Comments in shell programming start with # and go until the end of the line. We really recommend you to use comments. If you have comments and you don't use a certain script for some time you will still know immediately what it is doing and how it works.

Variables

As in other programming languages you can't live without variables. In shell programming all variables have the datatype string and you do not need to declare them. To assign a value to a variable you write: varname=value

To get the value back you just put a dollar sign in front of the variable: #!/bin/sh# assign a value:a="hello world"# now print the content of "a":echo "A is:"echo $a

Type this lines into your text editor and save it e.g. as first. Then make the script executable by typing chmod +x first in the shell and then start it by typing ./firstThe script will just print: A is:hello world

Sometimes it is possible to confuse variable names with the rest of the text: num=2echo "this is the $numnd"

This will not print "this is the 2nd" but "this is the " because the shell searches for a variable called numnd which has no value. To tell the shell that we mean the variable num we have to use curly braces: num=2echo "this is the ${num}nd"

This prints what you want: this is the 2nd

There are a number of variables that are always automatically set. We will discuss them further down when we use them the first time.

If you need to handle mathematical expressions then you need to use programs such as expr (see table below).Besides the normal shell variables that are only valid within the shell program there are also environment variables. A variable preceeded by the keyword export is an environment variable. We will not talk about them here any further since they are normally only used in login scripts.

Shell commands and control structures

There are three categories of commands which can be used in shell scripts:

1)Unix commands:Although a shell script can make use of any unix commands here are a number of commands

which are more often used than others. These commands can generally be described as commands for file and text manipulation.

Command syntax Purpose

echo "some text" write some text on your screen

ls list files

wc -l filewc -w filewc -c file

count lines in file orcount words in file orcount number of characters

cp sourcefile destfile copy sourcefile to destfile

mv oldname newname rename or move file

rm file delete a file

grep 'pattern' filesearch for strings in a fileExample: grep 'searchstring' file.txt

cut -b colnum file

get data out of fixed width columns of textExample: get character positions 5 to 9cut -b5-9 file.txtDo not confuse this command with "cat" which is something totally different

cat file.txt write file.txt to stdout (your screen)

file somefile describe what type of file somefile is

read var prompt the user for input and write it into a variable (var)

sort file.txt sort lines in file.txt

uniqremove duplicate lines, used in combination with sort since uniq removes only duplicated consecutive linesExample: sort file.txt | uniq

exprdo math in the shellExample: add 2 and 3expr 2 "+" 3

find

search for filesExample: search by name:find . -name filename -printThis command has many different possibilities and options. It is unfortunately too much to explain it all in this article.

tee

write data to stdout (your screen) and to a fileNormally used like this:somecommand | tee outfileIt writes the output of somecommand to the screen and to the file outfile

basename filereturn just the file name of a given name and strip the directory pathExample: basename /bin/tuxreturns just tux

dirname file return just the directory name of a given name and strip the actual

file nameExample: dirname /bin/tuxreturns just /bin

head file print some lines from the beginning of a file

tail file print some lines from the end of a file

sed

sed is basically a find and replace program. It reads text from standard input (e.g from a pipe) and writes the result to stdout (normally the screen). The search pattern is a regular expression (see references). This search pattern should not be confused with shell wildcard syntax. To replace the string linuxfocus with LinuxFocus in a text file use:cat text.file | sed 's/linuxfocus/LinuxFocus/' > newtext.fileThis replaces the first occurance of the string linuxfocus in each line with LinuxFocus. If there are lines where linuxfocus appears several times and you want to replace all use:cat text.file | sed 's/linuxfocus/LinuxFocus/g' > newtext.file

awk

Most of the time awk is used to extract fields from a text line. The default field separator is space. To specify a different one use the option -F. cat file.txt | awk -F, '{print $1 "," $3 }'

Here we use the comma (,) as field separator and print the first and third ($1 $3) columns. If file.txt has lines like: Adam Bor, 34, IndiaKerry Miller, 22, USA

then this will produce: Adam Bor, IndiaKerry Miller, USA

There is much more you can do with awk but this is a very common use.

2) Concepts: Pipes, redirection and backtickThey are not really commands but they are very important concepts.

pipes (|) send the output (stdout) of one program to the input (stdin) of another program. grep "hello" file.txt | wc -l

finds the lines with the string hello in file.txt and then counts the lines.The output of the grep command is used as input for the wc command. You can concatinate as many commands as you like in that way (within reasonable limits).

redirection: writes the output of a command to a file or appends data to a file> writes output to a file and overwrites the old file in case it exists>> appends data to a file (or creates a new one if it doesn't exist already but it never overwrites anything).

BacktickThe output of a command can be used as command line arguments (not stdin as above, command line arguments are any strings that you specify behind the command such as file names and options) for another command. You can as well use it to assign the output of a command to a variable. The command find . -mtime -1 -type f -print

finds all files that have been modified within the last 24 hours (-mtime -2 would be 48 hours). If you want to pack all these files into a tar archive (file.tar) the syntax for tar would be: tar xvf file.tar infile1 infile2 ...

Instead of typing it all in you can combine the two commands (find and tar) using backticks. Tar will then pack all the files that find has printed: #!/bin/sh# The ticks are backticks (`) not normal quotes ('):tar -zcvf lastmod.tar.gz `find . -mtime -1 -type f -print`

3) Control structuresThe "if" statement tests if the condition is true (exit status is 0, success). If it is the "then" part gets executed: if ....; then ....elif ....; then ....else ....fi

Most of the time a very special command called test is used inside if-statements. It can be used to compare strings or test if a file exists, is readable etc... The "test" command is written as square brackets " [ ] ". Note that space is significant here: Make sure that you always have space around the brackets. Examples: [ -f "somefile" ] : Test if somefile is a file.[ -x "/bin/ls" ] : Test if /bin/ls exists and is executable.[ -n "$var" ] : Test if the variable $var contains something[ "$a" = "$b" ] : Test if the variables "$a" and "$b" are equal

Run the command "man test" and you get a long list of all kinds of test operators for comparisons and files. Using this in a shell script is straight forward: #!/bin/shif [ "$SHELL" = "/bin/bash" ]; then echo "your login shell is the bash (bourne again shell)"else echo "your login shell is not bash but $SHELL"fi

The variable $SHELL contains the name of the login shell and this is what we are testing here by comparing it against the string "/bin/bash"

Shortcut operatorsPeople familiar with C will welcome the following expression:[ -f "/etc/shadow" ] && echo "This computer uses shadow passwors"

The && can be used as a short if-statement. The right side gets executed if the left is true. You can read this as AND. Thus the example is: "The file /etc/shadow exists AND the command echo is executed". The OR operator (||) is available as well. Here is an example: #!/bin/shmailfolder=/var/spool/mail/james[ -r "$mailfolder" ] || { echo "Can not read $mailfolder" ; exit 1; }echo "$mailfolder has mail from:"grep "^From " $mailfolder

The script tests first if it can read a given mailfolder. If yes then it prints the "From" lines in the folder. If it cannot read the file $mailfolder then the OR operator takes effect. In plain English you read this code as "Mailfolder readable or exit program". The problem here is that you must have exactly one command behind the OR but we need two: -print an error message -exit the program To handle them as one command we can group them together in an anonymous function using curly braces. Functions in general are explained further down. You can do everything without the ANDs and ORs using just if-statements but sometimes the shortcuts AND and OR are just more convenient.

The case statement can be used to match (using shell wildcards such as * and ?) a given string against a number of possibilities. case ... in...) do something here;;esac

Let's look at an example. The command file can test what kind of filetype a given file is: file lf.gz

returns:

lf.gz: gzip compressed data, deflated, original filename,last modified: Mon Aug 27 23:09:18 2001, os: Unix

We use this now to write a script called smartzip that can uncompress bzip2, gzip and zip compressed files automatically : #!/bin/shftype=`file "$1"`case "$ftype" in"$1: Zip archive"*) unzip "$1" ;;"$1: gzip compressed"*) gunzip "$1" ;;"$1: bzip2 compressed"*) bunzip2 "$1" ;;*) error "File $1 can not be uncompressed with smartzip";;esac

Here you notice that we use a new special variable called $1. This variable contains the first argument given to a program. Say we run smartzip articles.zip then $1 will contain the string articles.zip

The select statement is a bash specific extension and is very good for interactive use. The user can select a choice from a list of different values: select var in ... ; do breakdone.... now $var can be used ....

Here is an example: #!/bin/shecho "What is your favourite OS?"select var in "Linux" "Gnu Hurd" "Free BSD" "Other"; do breakdoneecho "You have selected $var"

Here is what the script does: What is your favourite OS?1) Linux2) Gnu Hurd3) Free BSD4) Other#? 1You have selected Linux

In the shell you have the following loop statements available: while ...; do ....done

The while-loop will run while the expression that we test for is true. The keyword "break" can be used to leave the loop at any point in time. With the keyword "continue" the loop continues with the next iteration and skips the rest of the loop body.

The for-loop takes a list of strings (strings separated by space) and assigns them to a variable: for var in ....; do ....done

The following will e.g. print the letters A to C on the screen: #!/bin/shfor var in A B C ; do echo "var is $var"done

A more useful example script, called showrpm, prints a summary of the content of a number of RPM-packages: #!/bin/sh# list a content summary of a number of RPM packages# USAGE: showrpm rpmfile1 rpmfile2 ...# EXAMPLE: showrpm /cdrom/RedHat/RPMS/*.rpmfor rpmpackage in $*; do if [ -r "$rpmpackage" ];then echo "=============== $rpmpackage ==============" rpm -qi -p $rpmpackage else echo "ERROR: cannot read file $rpmpackage" fidone

Above you can see the next special variable, $* which contains all the command line arguments. If you run showrpm openssh.rpm w3m.rpm webgrep.rpm then $* contains the 3 strings openssh.rpm, w3m.rpm and webgrep.rpm.

The GNU bash knows until-loops as well but generally while and for loops are sufficient.

QuotingBefore passing any arguments to a program the shell tries to expand wildcards and variables. To expand means that the wildcard (e.g. *) is replaced by the appropriate file names or that a variable is replaced by its value. To change this behaviour you can use quotes: Let's say we have a number of files in the current directory. Two of them are jpg-files, mail.jpg and tux.jpg. #!/bin/shecho *.jpg

This will print "mail.jpg tux.jpg".Quotes (single and double) will prevent this wildcard expansion: #!/bin/shecho "*.jpg"echo '*.jpg'

This will print "*.jpg" twice. Single quotes are most strict. They prevent even variable expansion. Double quotes prevent wildcard expansion but allow variable expansion: #!/bin/shecho $SHELLecho "$SHELL"echo '$SHELL'

This will print: /bin/bash/bin/bash$SHELL

Finally there is the possibility to take the special meaning of any single character away by preceeding it with a backslash: echo \*.jpgecho \$SHELL

This will print: *.jpg$SHELL

Here documentsHere documents are a nice way to send several lines of text to a command. It is quite useful to write a help text in a script without having to put echo in front of each line. A "Here document" starts with << followed by some string that must also appear at the end of the here document. Here is an example script, called ren, that renames multiple files and uses a here document for its help text: #!/bin/sh# we have less than 3 arguments. Print the help text:if [ $# -lt 3 ] ; thencat <<HELPren -- renames a number of files using sed regular expressions

USAGE: ren 'regexp' 'replacement' files...

EXAMPLE: rename all *.HTM files in *.html: ren 'HTM$' 'html' *.HTM

HELP exit 0fiOLD="$1"NEW="$2"# The shift command removes one argument from the list of# command line arguments.shiftshift# $* contains now all the files:for file in $*; do if [ -f "$file" ] ; then newfile=ècho "$file" | sed "s/${OLD}/${NEW}/g"` if [ -f "$newfile" ]; then echo "ERROR: $newfile exists already" else echo "renaming $file to $newfile ..." mv "$file" "$newfile" fi fidone

This is the most complex script so far. Let's discuss it a little bit. The first if-statement tests if we have provided at least 3 command line parameters. (The special variable $# contains the number of arguments.) If not, the help text is sent to the command cat which in turn sends it to the screen. After printing the help text we exit the program. If there are 3 or more arguments we assign the first argument to the variable OLD and the second to the variable NEW. Next we shift the command line parameters twice to get the third argument into the first position of $*. With $* we enter the for loop. Each of the arguments in $* is now assigned one by one to the variable $file. Here we first test that the file really exists and then we construct the new file name by using find and replace with sed. The backticks are used to assign the result to the variable newfile. Now we have all we need: The old file name and the new one. This is then used with the command mv to rename the files.

FunctionsAs soon as you have a more complex program you will find that you use the same code in several places and also find it helpful to give it some structure. A function looks like this: functionname(){ # inside the body $1 is the first argument given to the function # $2 the second ... body}

You need to "declare" functions at the beginning of the script before you use them.

Here is a script called xtitlebar which you can use to change the name of a terminal window. If you have several of them open it is easier to find them. The script sends an escape sequence which is interpreted by the terminal and causes it to change the name in the titlebar. The script uses a function called help. As you can see the function is defined once and then used twice: #!/bin/sh

# vim: set sw=4 ts=4 et:

help(){ cat <<HELPxtitlebar -- change the name of an xterm, gnome-terminal or kde konsole

USAGE: xtitlebar [-h] "string_for_titelbar"

OPTIONS: -h help text

EXAMPLE: xtitlebar "cvs"

HELP exit 0}

# in case of error or if -h is given we call the function help:[ -z "$1" ] && help[ "$1" = "-h" ] && help

# send the escape sequence to change the xterm titelbar:echo -e "\033]0;$1\007"#

It's a good habit to always have extensive help inside the scripts. This makes it possible for others (and you) to use and understand the script.

Command line argumentsWe have seen that $* and $1, $2 ... $9 contain the arguments that the user specified on the command line (The strings written behind the program name). So far we had only very few or rather simple command line syntax (a couple of mandatory arguments and the option -h for help). But soon you will discover that you need some kind of parser for more complex programs where you define your own options. The convention is that all optional parameters are preceeded by a minus sign and must come before any other arguments (such as e.g file names).

There are many possibilities to implement a parser. The following while loop combined with a case statement is a very good solution for a generic parser: #!/bin/shhelp(){ cat <<HELPThis is a generic command line parser demo.USAGE EXAMPLE: cmdparser -l hello -f -- -somefile1 somefile2HELP exit 0}

while [ -n "$1" ]; docase $1 in -h) help;shift 1;; # function help is called -f) opt_f=1;shift 1;; # variable opt_f is set -l) opt_l=$2;shift 2;; # -l takes an argument -> shift by 2 --) shift;break;; # end of options -*) echo "error: no such option $1. -h for help";exit 1;;

*) break;;esacdone

echo "opt_f is $opt_f"echo "opt_l is $opt_l"echo "first arg is $1"echo "2nd arg is $2"

Try it out! You can run it e.g with: cmdparser -l hello -f -- -somefile1 somefile2

It produces opt_f is 1opt_l is hellofirst arg is -somefile12nd arg is somefile2

How does it work? Basically it loops through all arguments and matches them against the case statement. If it finds a matching one it sets a variable and shifts the command line by one. The unix convention is that options (things starting with a minus) must come first. You may indicate that this is the end of option by writing two minus signs (--). You need it e.g with grep to search for a string starting with a minus sign: Search for -xx- in file f.txt:grep -- -xx- f.txt

Our option parser can handle the -- too as you can see in the listing above.

Examples

A general purpose sceleton

Now we have discussed almost all components that you need to write a script. All good scripts should have help and you can as well have our generic option parser even if the script has just one option. Therefore it is a good idea to have a dummy script, called framework.sh, which you can use as a framework for other scripts. If you want to write a new script you just make a copy: cp framework.sh myscript

and then insert the actual functionality into "myscript".

Let's now look at two more examples:

A binary to decimal number converter

The script b2d converts a binary number (e.g 1101) into its decimal equivalent. It is an example that shows that you can do simple mathematics with expr: #!/bin/sh# vim: set sw=4 ts=4 et:help(){ cat <<HELPb2h -- convert binary to decimal

USAGE: b2h [-h] binarynum


EXAMPLE: b2h 111010will return 58HELP exit 0}

error(){ # print an error and exit echo "$1" exit 1}

lastchar(){ # return the last character of a string in $rval if [ -z "$1" ]; then # empty string rval="" return fi # wc puts some space behind the output this is why we need sed: numofchar=ècho -n "$1" | wc -c | sed 's/ //g' ` # now cut out the last char rval=ècho -n "$1" | cut -b $numofchar`}

chop(){ # remove the last character in string and return it in $rval if [ -z "$1" ]; then # empty string rval="" return fi # wc puts some space behind the output this is why we need sed: numofchar=ècho -n "$1" | wc -c | sed 's/ //g' ` if [ "$numofchar" = "1" ]; then # only one char in string rval="" return fi numofcharminus1=èxpr $numofchar "-" 1` # now cut all but the last char: rval=ècho -n "$1" | cut -b 0-${numofcharminus1}`}

while [ -n "$1" ]; docase $1 in -h) help;shift 1;; # function help is called --) shift;break;; # end of options -*) error "error: no such option $1. -h for help";; *) break;;esacdone

# The main programsum=0weight=1# one arg must be given:[ -z "$1" ] && helpbinnum="$1"binnumorig="$1"

while [ -n "$binnum" ]; do lastchar "$binnum" if [ "$rval" = "1" ]; then sum=èxpr "$weight" "+" "$sum"` fi # remove the last position in $binnum chop "$binnum" binnum="$rval" weight=èxpr "$weight" "*" 2`done

echo "binary $binnumorig is decimal $sum"#

The algorithm used in this script takes the decimal weight (1,2,4,8,16,..) of each digit starting from the right most digit and adds it to the sum if the digit is a 1. Thus "10" is: 0 * 1 + 1 * 2 = 2 To get the digits from the string we use the function lastchar. This uses wc -c to count the number of characters in the string and then cut to cut out the last character. The chop function has the same logic but removes the last character, that is it cuts out everything from the beginning to the character before the last one.

A file rotation programPerhaps you are one of those who save all outgoing mail to a file. After a couple of months this file becomes rather big and it makes the access slow if you load it into your mail program. The following script rotatefile can help you. It renames the mailfolder, let's call it outmail, to outmail.1 if there was already an outmail.1 then it becomes outmail.2 etc... #!/bin/sh# vim: set sw=4 ts=4 et:ver="0.1"help(){ cat <<HELProtatefile -- rotate the file name

USAGE: rotatefile [-h] filename


EXAMPLE: rotatefile outThis will e.g rename out.2 to out.3, out.1 to out.2, out to out.1and create an empty out-file

The max number is 10

version $ver

HELP exit 0}

error(){ echo "$1" exit 1}while [ -n "$1" ]; docase $1 in -h) help;shift 1;; --) break;; -*) echo "error: no such option $1. -h for help";exit 1;; *) break;;esacdone

# input check:if [ -z "$1" ] ; then error "ERROR: you must specify a file, use -h for help"fifilen="$1"# rename any .1 , .2 etc file:for n in 9 8 7 6 5 4 3 2 1; do if [ -f "$filen.$n" ]; then p=èxpr $n + 1` echo "mv $filen.$n $filen.$p" mv $filen.$n $filen.$p fidone# rename the original file:if [ -f "$filen" ]; then echo "mv $filen $filen.1" mv $filen $filen.1fiecho touch $filentouch $filen

How does the program work? After checking that the user provided a filename we go into a for loop counting from 9 to 1. File 9 is now renamed to 10, file 8 to 9 and so on. After the loop we rename the original file to 1 and create an empty file with the name of the original file.

Debugging

The most simple debugging help is of course the command echo. You can use it to print specific variables around the place where you suspect the mistake. This is probably what most shell programmers use 80% of the time to track down a mistake. The advantage of a shell script is that it does not require any re-compilation and inserting an "echo" statement is done very quickly.

The shell has a real debug mode as well. If there is a mistake in your script "strangescript" then you can debug it like this: sh -x strangescript

This will execute the script and show all the statements that get executed with the variables and wildcards already expanded.

The shell also has a mode to check for syntax errors without actually executing the program. To use this run: sh -n your_script

If this returns nothing then your program is free of syntax errors.

We hope you will now start writing your own shell scripts. Have fun!

Bourne Shell Programming Overview

These are the contents of a shell script called display:

cat display # This script displays the date, time, username and # current directory. echo "Date and time is:" date echo echo "Your username is: `whoami` \\n" echo "Your current directory is: \\c" pwd

The first two lines beginning with a hash (#) are comments and are not interpreted by the shell. Use comments to document your shell script; you will be surprised how easy it is to forget what your own programs do!

The backquotes (`) around the command whoami illustrate the use of command substitution.

The \\n is an option of the echo command that tells the shell to add an extra carriage return at the end of the line. The \\c tells the shell to stay on the same line. See the man page for details of other options.

The argument to the echo command is quoted to prevent the shell interpreting these commands as though they had been escaped with the \\ (backslash) character.

The shell also provides you with a programming environment with features similar to those of a high level programming languages.

* The UNIX operating system provides a flexible set of simple tools to perform a wide variety of system-management, text-processing, and general-purpose tasks. These simple tools can be used in very powerful ways by tying them together programmatically, using "shell scripts" or "shell programs".

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/commanz/cmd2.2.html

http://sciris.shu.edu/cgi-bin/man-cgi?echo

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/commanz/cmd_sub.html

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/didyou/username.html

The UNIX "shell" itself is a user-interface program that accepts commands from the user and executes them. It can also accept the same commands written as a list in a file, along with various other statements that the shell can interpret to provide input, output, decision-making, looping, variable storage, option specification, and so on. This file is a shell program.

Shell programs are, like any other programming language, useful for some things but not for others. They are excellent for system-management tasks but not for general-purpose programming of any sophistication. Shell programs, though generally simple to write, are also tricky to debug and slow in operation.

There are three versions of the UNIX shell: the original "Bourne shell (sh)", the "C shell (csh)" that was derived from it, and the "Korn shell (ksh)" that is in predominant use. The Bourne shell is in popular use as the freeware "Bourne-again shell" AKA "bash".

This document focuses on the Bourne shell. The C shell is more powerful but has various limitations, and while the Korn shell is clean and more powerful than the other two shells, it is a superset of the Bourne shell: anything that runs on a Bourne shell runs on a Korn shell, though the reverse is not true. Since the Bourne shell's capabilities are probably more than most people require, there's no reason to elaborate much beyond them in an introductory document, and the rest of the discussion will assume use of the Bourne shell unless otherwise stated.

[1] GETTING STARTED

* The first thing to do in understanding shell programs is to understand the elementary system commands that can be used in them. A list of fundamental UNIX system commands follows:

ls # Give a simple listing of files. cp # Copy files. mv # Move or rename files. rm # Remove files. rm -r # Remove entire directory subtree. cd # Change directories. pwd # Print working directory. cat # Lists a file or files sequentially. more # Displays a file a screenfull at a time. pg # Variant on "more". mkdir # Make a directory. rmdir # Remove a directory.The shell executes such commands when they are typed in from the command prompt with their appropriate parameters, which are normally options and file names.

* The shell also allows files to be defined in terms of "wildcard characters" that define a range of files. The "*" wildcard character substitutes for any string of characters, so:

rm *.txt-- deletes all files that end with ".txt". The "?" wildcard character substitutes for any single character, so:

rm book?.txt-- deletes "book1.txt", "book2.txt", and so on. More than one wildcard character can be used at a time, for example:

rm *book?.txt-- deletes "book1.txt", "mybook1.txt", "bigbook2.txt", and so on.

* Another shell capability is "input and output redirection". The shell, like other UNIX utilities, accepts input by default from what is called "standard input", and generates output by default to what is called "standard output". These are normally defined as the keyboard and display, respectively, or what is referred to as the "console" in UNIX terms. However, standard input or output can be "redirected" to a file or another program if needed. Consider the "sort" command. This command sorts a list of words into alphabetic order; typing in:

sort PORKY ELMER FOGHORN DAFFY WILE BUGS <CTL-D>-- spits back:

BUGS DAFFY ELMER FOGHORN PORKY WILENote that the CTL-D key input terminates direct keyboard input. It is also possible to store the same words in a file and then "redirect" the contents of that file to standard input with the "<" operator:

sort < names.txtThis would list the sorted names to the display as before. They can be redirected to a file with the ">" operator:

sort < names.txt > output.txtThey can also be appended to an existing file using the ">>" operator:

sort < names.txt >> output.txtIn these cases, there's no visible output, since the command just executes and ends. However, if that's a problem, it can be fixed by connecting the "tee" command to the output through a "pipe", designated by "|". This allows the standard output of one command to be chained into the standard input of another command. In the case of "tee", it accepts text into its standard input and then dumps it both to a file and to standard output:

sort < names.txt | tee output.txtSo this both displays the names and puts them in the output file. Many commands can be chained together to "filter" information through several processing steps. This ability to combine the effects of

commands is one of the beauties of shell programming. By the way, "sort" has some handy additional options:

sort -u # Eliminate redundant lines in output. sort -r # Sort in reverse order. sort -n # Sort numbers. sort -k 2 # Skip first field in sorting.* If a command generates an error, it is displayed to what is called "standard error", instead of standard output, which defaults to the console. It will not be redirected by ">". However, the operator "2>" can be used to redirect the error message. For example:

ls xyzzy 2> /dev/null-- will give an error message if the file "xyzzy" doesn't exist, but the error will be redirected to the file "/dev/null". This is actually a "special file" that exists under UNIX where everything sent to it is simply discarded.

* The shell permits the execution of multiple commands sequentially on one line by chaining them with a ";":

rm *.txt ; lsA time-consuming program can also be run in a "parallel" fashion by following it with a "&":

sort < bigfile.txt > output.txt &* These commands and operations are essential elements for creating shell programs. They can be stored in a file and then executed by the shell. To tell the shell that the file contains commands, just mark it as "executable" with the "chmod" command. Each file under UNIX has a set of "permission" bits, listed by an "ls -l" -- the option providing file details -- as:

rwxrwxrwxThe "r" gives "read" permission, the "w" gives "write" permission, and the "x" gives "execute" permission. There are three sets of these permission bits, one for the user, one for other members of a local group of users on a system, and one for everyone who can access the system -- remember that UNIX was designed as a multiuser environment.

The "chmod" command can be used to set these permissions, with the permissions specified as an octal code. For example:

chmod 644 myfile.txtThis sets both read and write permission on the file for the user, but everybody else on the system only gets read permission. The same octal scheme can be used to set execute permission, though it's simpler just to use chmod "+x" option:

chmod +x mypgmThis done, if the name "mypgm" is entered at the prompt, the shell reads the commands out of "mypgm" and executes them. The execute permission can be removed with the "-x" option.

For example, suppose we want to be able to inspect the contents of a set of archive files stored in the directory "/users/group/archives". We could create a file named "ckarc" and store the following command string in it:

ls /users/group/archives | pgThis is a very simple shell program. As noted, the shell has control constructs, supports storage variables, and has several options that can be set to allow much more sophisticated programs. The following sections describe these features in a quick outline fashion.

Incidentally, this scheme for creating executable files is for the UNIX environment. Under the Windows environment, the procedure is to end shell program file names in a distinctive extension -- ".sh" is a good choice, though any unique extension will do -- and then configure Windows to run all files with that extension with a UNIX-type shell, usually bash.

[2] SHELL VARIABLES

* The first useful command to know about in building shell programs is "echo", which can be used to produce output from a shell program:

echo "This is a test!"This sends the string "This is a test!" to standard output. It is recommended to write shell programs that generate some output to inform the user of what they are doing.

The shell allows variables to be defined to store values. It's simple, just declare a variable is assign a value to it:

shvar="This is a test!"The string is enclosed in double-quotes to ensure that the variable swallows the entire string (more on this later), and there are no spaces around the "=". The value of the shell variable can be obtained by preceding it with a "$":

echo $shvarThis displays "This is a test!". If no value had been stored in that shell variable, the result would have simply been a blank line. Values stored in shell variables can be used as parameters to other programs as well:

ls $lastdirThe value stored in a shell variable can be erased by assigning the "null string" to the variable:

shvar=""There are some subtleties in using shell variables. For example, suppose a shell program performed the assignment:

allfiles=*-- and then performed:

echo $allfiles

This would echo a list of all the files in the directory. However, only the string "*" would be stored in "allfiles". The expansion of "*" only occurs when the "echo" command is executed.

Another subtlety is in modifying the values of shell variables. Suppose we have a file name in a shell variable named "myfile" and want to copy that file to another with the same name, but with "2" tacked on to the end. We might think to try:

mv $myfile $myfile2-- but the problem is that the shell will think that "myfile2" is a different shell variable, and this won't work. Fortunately, there is a way around this; the change can be made as follows:

mv $myfile ${myfile}2A UNIX installation will have some variables installed by default, most importantly $HOME, which gives the location of a particular user's home directory.

As a final comment on shell variables, if one shell program calls another and the two shell programs have the same variable names, the two sets of variables will be treated as entirely different variables. To call other shell programs from a shell program and have them use the same shell variables as the calling program requires use of the "export" command:

shvar="This is a test!" export shvar echo "Calling program two." shpgm2 echo "Done!"If "shpgm2" simply contains:

echo $shvar-- then it will echo "This is a test!".

[3] COMMAND SUBSTITUTION

* The next step is to consider shell command substitution. Like any programming language, the shell does exactly what it is told to do, and so it is important to be very specific when telling it to do something. As an example, consider the "fgrep" command, which searches a file for a string. For example, to search a file named "source.txt" for the string "Coyote", enter:

fgrep Coyote source.txt-- and it would print out the matching lines. However, suppose we wanted to search for "Wile E. Coyote". If we did this as:

fgrep Wile E. Coyote source.txt-- we'd get an error message that "fgrep" couldn't open "E.". The string has to be enclosed in double-quotes (""):

fgrep "Wile E. Coyote" source.txtIf a string has a special character in it, such as "*" or "?", that must be interpreted as a "literal" and not a wildcard, the shell can get a little confused. To ensure that the wildcards are not interpreted, the

wildcard can either be "escaped" with a backslash ("\*" or "\?") or the string can be enclosed in single quotes, which prevents the shell from interpreting any of the characters within the string. For example, if:

echo "$shvar"-- is executed from a shell program, it would output the value of the shell variable "$shvar". In contrast, executing:

echo '$shvar'-- the output is the string "$shvar".

* Having considered "double-quoting" and "single-quoting", let's now consider "back-quoting". This is a little tricky to explain. As a useful tool, consider the "expr" command, which can be used to perform simple math from the command line:

expr 2 + 4This displays the value "6". There must be spaces between the parameters; in addition, to perform a multiplication the "*" has to be "escaped" so the shell doesn't interpret it:

expr 3 \* 7 Now suppose the string "expr 12 / 3" has been stored in a shell variable named "shcmd"; then executing:

echo $shcmd-- or:

echo "$shcmd"-- would simply produce the text "expr 12 / 3". If single-quotes were used:

echo '$shcmd'-- the result would be the string "$shcmd". However, if back-quotes, the reverse form of a single quote, were used:

echo `$shcmd`-- the result would be the value "4", since the string inside "shcmd" is executed. This is an extremely powerful technique that can be very confusing to use in practice.

[4] COMMAND-LINE ARGUMENTS

* In general, shell programs operate in a "batch" mode, that is, without interaction from the user, and so most of their parameters are obtained on the command line. Each argument on the command line can be seen inside the shell program as a shell variable of the form "$1", "$2", "$3", and so on, with "$1" corresponding to the first argument, "$2" the second, "$3" the third, and so on.

There is also a "special" argument variable, "$0", that gives the name of the shell program itself. Other special variables include "$#", which gives the number of arguments supplied, and "$*", which gives a string with all the arguments supplied.

Since the argument variables are in the range "$1" to "$9", so what happens there's more than 9 arguments? No problem, the "shift" command can be used to move the arguments down through the argument list. That is, when "shift" is executed, then the second argument becomes "$1", the third argument becomes "$2", and so on; and if a "shift" is performed again, the third argument becomes "$1"; and so on. A count can be specified to cause a multiple shift:

shift 3-- shifts the arguments three times, so that the fourth argument ends up in "$1".

[5] DECISION-MAKING & LOOP CONSTRUCTS

* Shell programs can perform conditional tests on their arguments and variables and execute different commands based on the results. For example:

if [ "$1" = "hyena" ] then echo "Sorry, hyenas not allowed." exit elif [ "$1" = "jackal" ] then echo "Jackals not welcome." exit else echo "Welcome to Bongo Congo." fi echo "Do you have anything to declare?"-- checks the command line to see if the first argument is "hyena" or "jackal" and bails out, using the "exit" command, if they are. Other arguments allow the rest of the file to be executed. Note how "$1" is enclosed in double quotes, so the test will not generate an error message if it yields a null result.

There are a wide variety of such test conditions:

[ "$shvar" = "fox" ] String comparison, true if match. [ "$shvar" != "fox" ] String comparison, true if no match. [ "$shvar" = "" ] True if null variable. [ "$shvar" != "" ] True if not null variable.

[ "$nval" -eq 0 ] Integer test; true if equal to 0. [ "$nval" -ge 0 ] Integer test; true if greater than or equal to 0. [ "$nval" -gt 0 ] Integer test; true if greater than 0. [ "$nval" -le 0 ] Integer test; true if less than or equal to 0. [ "$nval" -lt 0 ] Integer test; true if less than to 0. [ "$nval" -ne 0 ] Integer test; true if not equal to 0.

[ -d tmp ] True if "tmp" is a directory. [ -f tmp ] True if "tmp" is an ordinary file. [ -r tmp ] True if "tmp" can be read. [ -s tmp ] True if "tmp" is nonzero length. [ -w tmp ] True if "tmp" can be written.

[ -x tmp ] True if "tmp" is executable.Incidentally, in the example above:

if [ "$1" = "hyena" ]-- there is a potential pitfall in that a user might enter, say, "-d" as a command-line parameter, which would cause an error when the program was run. Now there is only so much that can be done to save users from their own clumsiness, and "bullet-proofing" simple example programs tends to make them not so simple any more, but there is a simple if a bit cluttered fix for such a potential pitfall. It is left as an exercise for the reader.

There is also a "case" control construct that checks for equality with a list of items. It can be used with the example at the beginning of this section:

case "$1" in "gorilla") echo "Sorry, gorillas not allowed." exit;; "hyena") echo "Hyenas not welcome." exit;; *) echo "Welcome to Bongo Congo.";; esacThe string ";;" is used to terminate each "case" clause.

* The fundamental loop construct in the shell is based on the "for" command. For example:

for nvar in 1 2 3 4 5 do echo $nvar done-- echoes the numbers 1 through 5. The names of all the files in the current directory could be displayed with:

for file in * do echo $file doneOne nice little feature of the shell is that if the "in" parameters are not specified for the "for" command, it just cycles through the command-line arguments.

* There is a "break" command to exit a loop if necessary:

for file do if [ "$file" = punchout ] then break else echo $file fi done

There is also a "continue" command that starts the next iteration of the loop immediately. There must be a command in the "then" or "else" clauses, or the result is an error message. If it's not convenient to actually do anything in the "then" clause, a ":" can be used as a "no-op" command:

then : else* There are two other looping constructs available as well, "while" and "until". For an example of "while":

n=10 while [ "$n" -ne 0 ] do echo $n n=èxpr $n - 1` done-- counts down from 10 to 1. The "until" loop has similar syntax but tests for a false condition:

n=10 until [ "$n" -eq 0 ] do ...

[6] OTHER SHELL FEATURES

* There are other useful features available for writing shell programs. For example, comments can be placed in shell programs by preceding the comments with a "#":

# This is an example shell program. cat /users/group/grouplog.txt | pg # Read group log file.It is strongly recommended to comment all shell programs. If they are just one-liners, a simple comment line at the top of the file will do. If they are complicated shell programs, they should have a title, revision number, revision date, and revision history along with descriptive comments. This will prevent confusion if multiple versions of the same shell program are found, or if the program is modified later. Shell programs can be obscure, even by the standards of programming languages, and it is useful to provide a few hints.

* Standard input can be read into a shell program using the "read" command. For example:

echo "What is your name?" read myname echo $myname-- echoes the user's own name. The "read" command will read each item of standard input into a list of shell variables until it runs out of shell variables, and then it will read all the rest of standard input into the last shell variable. As a result, in the example above, the user's name is stored into "myname".

* If a command is too long to fit on one line, the line continuation character "\" can be used to put it on more than one line:

echo "This is a test of \ the line continuation character."* There is a somewhat cryptic command designated by "." that executes a file of commands within a shell program. For example:

. mycmds-- will execute the commands stored in the file "mycmds". It's something like an "include" command in other languages.

* For debugging, the execution of a shell program can be traced using the "-x" option with the shell:

sh -x mypgm *This traces out the steps "mypgm" takes during the course of its operation.

* One last comment on shell programs before proceeding: What happens if with a shell program that just performs, say:

cd /users/coyote-- to change to another directory? Well ... nothing happens. After the shell program runs and exits, the directory remains unchanged. The reason is that the shell creates a new shell, or "subshell", to run the shell program, and when the shell program is finished, the subshell vanishes, along with any changes made in that subshell's environment. It is easier, at least in this simple case, to define a command alias in the UNIX "login" shell instead of struggling with the problem in shell programs.

[7] USEFUL TOOLS

* Before we go on to practical shell programs, let's consider a few more useful tools.

The "paste" utility takes a list of text files and concatenates them on a line-by-line basis. For example:

paste names.txt phone.txt > data.txt-- takes a file containing names and a file containing corresponding phone numbers and generates a file with each name and number "pasted" together on the same line.

* The "head" and "tail" utilities list the first 10 or last 10 lines in a file respectively. The number of lines to be listed can be specified if needed:

head -n -5 source.txt # List first 5 lines. tail -n -5 source.txt # List last 5 lines. tail -n +5 source.txt # List all lines from line 5.* The "tr" utility translates from one set of characters to another. For example, to translate uppercase characters to lowercase characters:

tr '[A-Z]' '[a-z]' < file1.txt > file2.txtThe reverse conversion can of course be made using:

tr '[a-z]' '[A-Z]' < file1.txt > file2.txtThe "-d" option deletes a character. For example:

tr -d '*'-- deletes all asterisks from the input stream. Note that "tr" only works on single characters.

* The "uniq" utility removes duplicate consecutive lines from a file. It has the syntax:

uniq source.txt output.txtA "-c" option provides an additional count of the number of times a line was duplicated, while a "-d" option displays only the duplicated lines in a file.

* The "wc (word count)" utility tallies up the characters, words, and lines of text in a text file. It can be invoked with the following options:

wc -c # Character count only. wc -w # Word count only. wc -l # Line count only.* The "find" utility is extremely useful, if a little hard to figure out. Essentially, it traverses a directory subtree and performs whatever action specified on every match. For example:

find / -name findtest.txt -printThis searches from the root directory ("/") for "findtest.txt", as designated by the "-name" option, and then prints the full pathname of the file, as designated by the "-print" option. Incidentally, "find" must be told what to do on a match; it will not by default say or do anything, it will just keep right on searching.

There are a wide variety of selection criteria. Simply printing out the names of directories in a search can be done with:

find . -type d -printFiles can also be found based on their username, date of last modification, size, and so on.

[8] REGULAR EXPRESSIONS

* An advanced set of tools can be used to perform searches on text strings in files and, in some cases, manipulate the strings found. These tools are known as "grep", "sed", and "awk" and are based on the concept of a "regular expression", which is a scheme by which specific text patterns can be specified by a set of special or "magic" characters.

The simplest regular expression is just the string being searched for. For example:

grep Taz *.txt-- finds every example of the string "Taz" in all files ending in ".txt", then displays the name of the file and the line of text containing the string.

But using the magic characters provides much more flexibility. For example:

grep ^Taz *.txt-- finds the string "Taz" only if it is at the beginning of the line. Similarly:

grep Taz$ *.txt-- matches it only if it is at the end of the line.

Now suppose we want to match both "Taz" and "taz". This can be done with:

[Tt]azThe square brackets ("[]") can be used to specify a range of characters. For example:

group_[abcdef]-- matches the strings "group_a", "group_b", and so on up to "group_f". This range specification can be simplified to:

group_[a-f]Similarly:

set[0123456789]-- can be simplified to:

set[0-9]It is also possible to match to all characters except a specific range. For example:

unit_[^xyz]-- matches "unit_a" or "unit_b", but not "unit_x" or "unit_y" or "unit_z".

Other magic characters provide a wildcard capability. The "." character can substitute for any single character, while the "*" substitutes for zero or more repetitions of the preceding regular expression. For example:

__*$-- matches any line that is padded with spaces to the right margin (for clarity the space character is represented here by a "_"). If a magic character is to be matched as a real item of text, it has to be "escaped" with a "\":

test\.txtThis matches "test.txt".

[9] GREP, SED, & AWK

* Now that we understand regular expressions, we can consider "grep", "sed", and "awk" in more detail.

The name of "grep" seems distinctly non-user-friendly, but it can be regarded as standing for "general regular expression processor"; as noted it searches a file for matches to a regular expression like "^Taz" or "_*$". It has a few useful options as well. For example:

grep -v <regular_expression> <file_list>

-- lists all lines that don't match the regular expression. Other options include:

grep -n # List line numbers of matches. grep -i # Ignore case. grep -l # Only list file names for a match.If there's no need to go through the bother of using regular expressions in a particular search, there is a variation on "grep" called "fgrep" (meaning "fixed grep" or "fast grep") that searches for matches on strings and runs faster; it was used in an earlier example. It uses the same options as described for "grep" above.

* The name "sed" stands for "stream editor" and it provides, in general, a search-and-replace capability. Its syntax for this task is as follows:

sed 's/<regular_expression>/<replacement_string>/[g]' source.txtThe optional "g" parameter specifies a "global" replacement. That is, if there are have multiple matches on the same line, "sed" will replace them all. Without the "g" option, it will only replace the first match on that line. For example, "sed" can be used to replace the string "flack" with "flak" as follows:

sed 's/flack/flak/g' source.txt > output.txtIt can also delete strings:

sed 's/bozo//'-- or perform substitutions and deletions from a list of such specifications stored in a file:

sed -f sedcmds.txt source.txt > output.txtAnother useful feature stops output on a pattern match:

sed '/^Target/q' source.txt > output.txtIt is also possible to append a file to the output after a pattern match:

sed '/^Target/ r newtext.txt' source.txt > output.txtThe "sed" utility has a wide variety of other options, but a full discussion of its capabilities is beyond the scope of this document.

* Finally, "awk" is a full-blown text processing language that looks something like a mad cross between "grep" and "C". In operation, "awk" takes each line of input and performs text processing on it. It recognizes the current line as "$0", with each word in the line recognized as "$1", "$2", "$3", and so on. This means that:

awk '{ print $0,$0 }' source.txt-- prints each line with duplicate text. A regular expression can be specified to identify a pattern match. For example, "awk" could tally the lines with the word "Taz" on them with:

awk '/Taz/ { taz++ }; END { print taz }' source.txtThe END clause used in this example allows execution of "awk" statements after the line-scanning has been completed. There is also a BEGIN clause that allows execution of "awk" statements before line-scanning begins. "Awk" can be used to do very simple or very complicated things. Its syntax is much like

that of "C", though it is much less finicky to deal with. Details of "awk" are discussed in a companion document.

[10] SHELL PROGRAM EXAMPLES

* The most elementary use of shell programs is to reduce complicated command strings to simpler commands and to provide handy utilities. For example, I can never remember the options for compiling an ANSI C program, so I store them in a script program named "compile":

cc $1.c -Aa -o $1Similarly, I like to timestamp my documents in a particular format, so I have a shell program named "td" ("timedate") that invokes "date" as follows:

date +"date: %A, %d %B %Y %H%M %Z"This gives, for example:

date: Friday, 24 November 1995 1340 MSTAnother simple example is a shell script to convert file names from uppercase to lowercase:

for file do mv $file ècho $file | tr "[A-Z]" "[a-z]"` doneIn this example, "for" is used to sequence through the file arguments, and of "tr" and back-quoting are used to establish the lower-case name for the file.

QUICK REFERENCE

* This final section provides a fast lookup reference for the materials in this document.

* Useful commands:

cat # Lists a file or files sequentially. cd # Change directories. chmod +x # Set execute permissions. chmod 666 # Set universal read-write permissions. cp # Copy files. expr 2 + 2 # Add 2 + 2. fgrep # Search for string match. grep # Search for string pattern matches. grep -v # Search for no match. grep -n # List line numbers of matches. grep -i # Ignore case. grep -l # Only list file names for a match. head -5 source.txt # List first 5 lines. ls # Give a simple listing of files. mkdir # Make a directory. more # Displays a file a screenfull at a time. mv # Move or rename files. paste f1 f2 # Paste files by columns. pg # Variant on "more".

pwd # Print working directory. rm # Remove files. rm -r # Remove entire directory subtree. rmdir # Remove a directory. sed 's/txt/TXT/g' # Scan and replace text. sed 's/txt//' # Scan and delete text. sed '/txt/q' # Scan and then quit. sort # Sort input. sort +1 # Skip first field in sorting. sort -n # Sort numbers. sort -r # Sort in reverse order. sort -u # Eliminate redundant lines in output. tail -5 source.txt # List last 5 lines. tail +5 source.txt # List all lines after line 5. tr '[A-Z]' '[a-z]' # Translate to lowercase. tr '[a-z]' '[A-Z]' # Translate to uppercase. tr -d '_' # Delete underscores. uniq # Find unique lines. wc # Word count (characters, words, lines). wc -w # Word count only. wc -l # Line count.* Elementary shell capabilities:

shvar="Test 1" # Initialize a shell variable. echo $shvar # Display a shell variable. export shvar # Allow subshells to use shell variable. mv $f ${f}2 # Append "2" to file name in shell variable. $1, $2, $3, ... # Command-line arguments. $0 # Shell-program name. $# # Number of arguments. $* # Complete argument list. shift 2 # Shift argument variables by 2. read v # Read input into variable "v". . mycmds # Execute commands in file.* IF statement:

if [ "$1" = "red" ] then echo "Illegal code." exit elif [ "$1" = "blue" ] then echo "Illegal code." exit else echo "Access granted." fi

[ "$shvar" = "red" ] String comparison, true if match. [ "$shvar" != "red" ] String comparison, true if no match. [ "$shvar" = "" ] True if null variable. [ "$shvar" != "" ] True if not null variable.

[ "$nval" -eq 0 ] Integer test; true if equal to 0. [ "$nval" -ge 0 ] Integer test; true if greater than or equal to 0. [ "$nval" -gt 0 ] Integer test; true if greater than 0.

[ "$nval" -le 0 ] Integer test; true if less than or equal to 0. [ "$nval" -lt 0 ] Integer test; true if less than to 0. [ "$nval" -ne 0 ] Integer test; true if not equal to 0.

[ -d tmp ] True if "tmp" is a directory. [ -f tmp ] True if "tmp" is an ordinary file. [ -r tmp ] True if "tmp" can be read. [ -s tmp ] True if "tmp" is nonzero length. [ -w tmp ] True if "tmp" can be written. [ -x tmp ] True if "tmp" is executable.* CASE statement:

case "$1" in "red") echo "Illegal code." exit;; "blue") echo "Illegal code." exit;; *) echo "Access granted.";; esac* Loop statements:

for nvar in 1 2 3 4 5 do echo $nvar done

for file # Cycle through command-line arguments. do echo $file done

while [ "$n" != "Joe" ] # Or: until [ "$n" = "Joe" ] do echo "What's your name?" read n echo $n doneThere are "break" and "continue" commands that exit or skip to the end of loops as the need arises.

Passing arguments to the shell

Shell scripts can act like standard UNIX commands and take arguments from the command line.

Arguments are passed from the command line into a shell program using the positional parameters $1 through to $9. Each parameter corresponds to the position of the argument on the command line.

The positional parameter $0 refers to the command name or name of the executable file containing the shell script.

Only nine command line arguments can be accessed, but you can access more than nine using the shift command.

All the positional parameters can be referred to using the special parameter $*. This is useful when passing filenames as arguments. For example:

cat printps # This script converts ASCII files to PostScript # and sends them to the PostScript printer ps1 # It uses a local utility "a2ps" a2ps $* | lpr -Pps1 printps elm.txt vi.ref msg

This processes the three files given as arguments to the command printps.

Handling shell variables

The shell has several variables which are automatically set whenever you login.

The values of some of these variables are stored in names which collectively are called your user environment.

Any name defined in the user environment, can be accessed from within a shell script. To include the value of a shell variable into the environment you must export it.

Special shell variables

There are some variables which are set internally by the shell and which are available to the user:

$1 - $9 these variables are the positional parameters.$0 the name of the command currently being executed.$# the number of positional arguments given to this invocation of the shell.$? the exit status of the last command executed is given as a decimal string. When a command completes successfully, it returns the exit status of 0 (zero), otherwise it returns a non-zero exit status.$$ the process number of this shell - useful for including in filenames, to make them unique.$! the process id of the last command run in the background.$- the current options supplied to this invocation of the shell.$* a string containing all the arguments to the shell, starting at $1.

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/scrpt/scrpt2.1.html

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/environment/env3a.html

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/scrpt/scrpt2.2.1.html



$@@ same as above, except when quoted.

Notes $* and $@@ when unquoted are identical and expand into the arguments. "$*" is a single word, comprising all the arguments to the shell, joined together with

spaces. For example '1 2' 3 becomes "1 2 3". "$@@" is identical to the arguments received by the shell, the resulting list of words

completely match what was given to the shell. For example '1 2' 3 becomes "1 2" "3"

Reading user input

To read standard input into a shell script use the read command. For example:

echo "Please enter your name:" read name echo "Welcome to Edinburgh $name"

This prompts the user for input, assigns this to the variable name and then displays the value of this variable to standard output. If there is more than one word in the input, each word can be assigned to a different variable. Any words left over are assigned to the last named variable. For example:

echo "Please enter your surname\n" echo "followed by your first name: \c" read name1 name2 echo "Welcome to Glasgow $name2 $name1"

Conditional statements

Every Unix command returns a value on exit which the shell can interrogate. This value is held in the read-only shell variable $?. A value of 0 (zero) signifies success; anything other than 0 (zero) signifies failure.

The if statement

The if statement uses the exit status of the given command and conditionally executes the statements following. The general syntax is:

if test then commands (if condition is true) else commands (if condition is false) fi

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/glossary/gs.html#stdout

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/glossary/gs.html#stdin

then, else and fi are shell reserved words and as such are only recognised after a newline or ; (semicolon). Make sure that you end each if construct with a fi statement. if statements may be nested:

if ... then ... else if ... ... fi fi

The elif statement can be used as shorthand for an else if statement. For example:

if ... then ... elif ... ... fi

The && operator

You can use the && operator to execute a command and, if it is successful, execute the next command in the list. For example:

cmd1 && cmd2

cmd1 is executed and its exit status examined. Only if cmd1 succeeds is cmd2 executed. This is a terse notation for:

if cmd1 then cmd2 fi

The || operator

You can use the || operator to execute a command and, if it fails, execute the next command in the command list. For example:

cmd1 || cmd2

cmd1 is executed and its exit status examined. If cmd1fails then cmd2 is executed. This is a terse notation for:

cmd1 if test $? -ne 0 then cmd2 fi

Testing for files and variables with the test command

The shell uses a command called test to evaluate conditional expressions. Full details of this command can be found in the test manual page. For example:

if test ! -f $FILE then if test "$WARN" = "yes" then echo "$FILE does not exist" fi fi

First, we test to see if the filename specified by the variable $FILE exists and is a regular file. If it does not then we test to see if the variable $WARN is assigned the value yes, and if it is a message that the filename does not exist is displayed.

The case statement

case is a flow control construct that provides for multi-way branching based on patterns. Program flow is controlled on the basis of the wordgiven. This word is compared with each pattern in order until a match is found, at which point the associated command(s) are executed.

case word in pattern1) command(s) ;; pattern2) command(s) ;; patternN) command(s) ;; esac

When all the commands are executed control is passed to the first statement after the esac. Each list of commands must end with a double semi-colon (;;). A command can be associated with more than one pattern. Patterns can be separated from each other by a | symbol. For example:

case word in pattern1|pattern2) command ... ;;

Patterns are checked for a match in the order in which they appear. A command is always carried out after the first instance of a pattern. The * character can be used to specify a default pattern as the * character is the shell wildcard character.

The for statement

The for loop notation has the general form:

http://www.math.shu.edu/Manuals/Unix/unixhelp/Pages/concepts/regexp1.html

http://sciris.shu.edu/cgi-bin/man-cgi?test

for var in list-of-words do commands done

commands is a sequence of one or more commands separated by a newline or ; (semicolon). The reserved words do and done must be preceded by a newline or ; (semicolon). Small loops can be written on a single line. For example:

for var in list; do commands; done

The while and until statements

The while statement has the general form:

while command-list1 do command-list2 done

The commands in command-list1 are executed; and if the exit status of the last command in that list is 0 (zero), the commands in command-list2 are executed. The sequence is repeated as long as the exit status of command-list1 is 0 (zero).

The until statement has the general form:

until command-list1 do command-list2 done

This is identical in function to the while command except that the loop is executed as long as the exit status of command-list1 is non-zero.

The exit status of a while/until command is the exit status of the last command executed in command-list2. If no such command list is executed, a while/until has an exit status of 0 (zero).

The break and continue statements

It is often necessary to handle exception conditions within loops. The statements break and continue are used for this. The break command terminates the execution of the innermost enclosing loop, causing execution to resume after the nearest done statement. To exit from n levels, use the command:

break n

This will cause execution to resume after the done n levels up. The continue command causes execution to resume at the while, until or for statement which begins the loop containing the continue command. You can also specify an argument n|FR to continue which will cause execution to continue at the n|FRth enclosing loop up.

Script files

Shelling script files do not require any extension, but you may see them with the .sh ending. There are two ways to call the script file:

./scriptfile.sh

sh scriptfile.sh

The first method requires that the script have the file permission of +x (execution) for the user. The second just requires more typing.

The first method also has a requirement that the first line in the file is:

#!/bin/sh

This line tells bash what interpreter to use. Some other common first lines include:

#!/usr/bin/perl#!/usr/bin/python#!/usr/bin/php

You need only choose the appropriate one for whichever language you are using. For shell scripting, however, you will always want to choose the to use the sh interpreter.

The test command

test is a simple program (/usr/bin/test) that will evaluate different things as true or false (ie, is this a file? a directory?, does it even exist?, and so forth).

In shell scripting you will utilize the test command very often, though at first you would not know it. This is because there is another name, /usr/bin/[ that is also the test program. You will see it used in the examples to follow. Although it is ugly to think of a command that is called an open bracket, it is just designed to save time typing up shell scripts.

Note: because [ is an actual program, it is important to be careful when employing spacing in shell scripting. For example, if[-f /etc/make.conf] and if [ -f /etc/make.conf ] are two different things, and the latter is the proper usage.

http://learn.clemsonlinux.org/wiki/File_Attributes_%26_Permissions

Conditional statements

This program checks to see if a file exists which you pass to the script on the command line, and outputs a message if so.

file="$1" if [ -a "$file" ] ; then echo "Yes, $file exists" fi

NOTE: the -a option tells bash to check if a file exists. Also, you can use elif, and else, if you want to check for other conditions. elif has the same parameter layout as if, and else has no parameters

invoke this script with:

sh <scriptname> <file to check for>

Step by step breakdown:

1) a variable is created called "file" and it set to the first argument you pass into the script, the <file to check for>

2) the if statement checks to see if the file exists using the -a flag. notice the spacing, this is a requirement for conditional statements.

3) the script will send a message if the file exists, if it doesn't, oh well.

4) the if statement is ended by fi, a requirement.

Looping

This script will list the contents of a directory and prepend "Directory", "File", or "Symlink" before the proper listing. this employs a for loop which cycles through the ls output of a directory you pass into the script on the command line.

directorytols=$1 for filename in $( ls "$directorytols") do if [ -d "$filename" ] ; then echo "Directory: $filename" elif [ -h "$filename" ] ; then echo "Symlink: $filename" else

echo "File: $filename" fi done

The Unix File System

Most Unix machines store their files on magnetic disk drives. A disk drive is a device that can store information by making electrical imprints on a magnetic surface. One or more heads skim close to the spinning magnetic plate, and can detect, or change, the magnetic state of a given spot on the disk. The drives use disk controllers to position the head at the correct place at the correct time to read from, or write to, the magnetic surface of the plate. It is often possible to partition a single disk drive into more than one logical storage area. This section describes how the Unix operating system deals with a raw storage device like a disk drive, and how it manages to make organized use of the space.

How the Unix file system works

Every item in a Unix file system can be defined as belonging to one of four possible types: Ordinary files

Ordinary files can contain text, data, or program information. An ordinary file cannot contain another file, or directory. An ordinary file can be thought of as a one-dimensional array of bytes.

Directories In a previous section, we described directories as containers that can hold files, and other directories. A directory is actually implemented as a file that has one line for each item contained within the directory. Each line in a directory file contains only the name of the item, and a numerical reference to the location of the item. The reference is called an i-number, and is an index to a table known as the i-list. The i-list is a complete list of all the storage space available to the file system.

Special files Special files represent input/output (i/o) devices, like a tty (terminal), a disk drive, or a printer. Because Unix treats such devices as files, a degree of compatibility can be achieved between device i/o, and ordinary file i/o, allowing for the more efficient use of software. Special files can be either character special files, that deal with streams of characters, or block special files, that operate on larger blocks of data. Typical block sizes are 512 bytes, 1024 bytes, and 2048 bytes.

Links A link is a pointer to another file. Remember that a directory is nothing more than a list of the names and i-numbers of files. A directory entry can be a hard link, in which the i-number points directly to another file. A hard link to a file is indistinguishable from the file itself. When a hard link is made, then the i-numbers of two different directory file entries point to the same inode. For that reason, hard links cannot span across file systems. A soft link (or symbolic link) provides an indirect pointer to a file. A soft link is implemented as a directory file entry containing a pathname. Soft links are distinguishable from files, and can span across file systems. Not all versions of Unix support soft links.

The I-List

When we speak of a Unix file system, we are actually referring to an area of physical memory represented by a single i-list. A Unix machine may be connected to several file systems, each with its own i-list. One of those i-lists points to a special storage area, known as the root file system. The root file system contains the files for the operating system itself, and must be available at all times. Other file systems are removable. Removable file systems can be attached, or mounted, to the root file system. Typically, an empty directory is created on the root file system as a mount point, and a removable file system is attached there. When you issue a cd command to access the files and directories of a mounted removable file system, your file operations will be controlled through the i-list of the removable file system.

The purpose of the i-list is to provide the operating system with a map into the memory of some physical storage device. The map is continually being revised, as the files are created and removed, and as they shrink and grow in size. Thus, the mechanism of mapping must be very flexible to accommodate drastic changes in the number and size of files. The i-list is stored in a known location, on the same memory storage device that it maps.

Each entry in an i-list is called an i-node. An i-node is a complex structure that provides the necessary flexibility to track the changing file system. The i-nodes contain the information necessary to get information from the storage device, which typically communicates in fixed-size disk blocks. An i-node contains 10 direct pointers, which point to disk blocks on the storage device. In addition, each i-node also contains one indirect pointer, one double indirect pointer, and one triple indirect pointer. The indirect pointer points to a block of direct pointers. The double indirect pointer points to a block of indirect pointers, and the triple indirect pointer points to a block of double indirect pointers. By structuring the pointers in a geometric fashion, a single i-node can represent a very large file.

It now makes a little more sense to view a Unix directory as a list of i-numbers, each i-number referencing a specific i-node on a specific i-list. The operating system traces its way through a file path by following the i-nodes until it reaches the direct pointers that contain the actual location of the file on the storage device.

The file system table

Each file system that is mounted on a Unix machine is accessed through its own block special file. The information on each of the block special files is kept in a system database called the file system table, and is usually located in /etc/fstab. It includes information about the name of the device, the directory name under which it will be mounted, and the read and write privileges for the device. It is possible to mount a file system as "read-only," to prevent users from changing anything.

File system quotas

Although not originally part of the Unix filesystem, quotas quickly became a widely-used tool. Quotas allow the system administrator to place limits on the amount of space the users can

allocate. Quotas usually place restrictions on the amount of space, and the number of files, that a user can take. The limit can be a soft limit, where only a warning is generated, or a hard limit, where no further operations that create files will be allowed.

The command

quota

will let you know if you're over your soft limit. Adding the -v option will provide statistics about your disk usage.

File system related commands

Here are some commands related to file system usage, and other topics discussed in this section: bdf

On HP-UX systems, reports file system usage statistics df

On HP-UX systems, reports on free disk blocks, and i-nodes du

Summarizes disk usage in a specified directory hierarchy ln

Creates a hard link (default), or a soft link (with -s option) mount, umount

Attaches, or detaches, a file system (super user only) mkfs

Constructs a new file system (super user only) fsck

Evaluates the integrity of a file system (super user only)

A brief tour of the Unix filesystem

The actual locations and names of certain system configuration files will differ under different implementations of Unix. Here are some examples of important files and directories under version 9 of the HP-UX operating system: /hp-ux

The kernel program /dev/

Where special files are kept /bin/

Executable system utilities, like sh, cp, rm /etc/

System configuration files and databases /lib/

Operating system and programming libraries /tmp/

System scratch files (all users can write here) /lost+found/

Where the file system checker puts detached files /usr/bin/

Additional user commands /usr/include/

Standard system header files /usr/lib/

More programming and system call libraries /usr/local/

Typically a place where local utilities go /usr/man

The manual pages are kept here

Other places to look for useful stuff

If you get an account on an unfamiliar Unix system, take a tour of the directories listed above, and familiarize yourself with their contents. Another way to find out what is available is to look at the contents of your PATH environment variable:

echo $PATH

You can use the ls command to list the contents of each directory in your path, and the man command to get help on unfamiliar utilities. A good systems administrator will ensure that manual pages are provided for the utilities installed on the system.

What is Linux?

Linux is a true 32-bit operating system that runs on a variety of different platforms, including Intel, Sparc, Alpha, and Power-PC (on some of these platforms, such as Alpha, Linux is actually 64-bit). There are other ports available as well, but I do not have any experience with them.

Linux was first developed back in the early 1990s, by a young Finnish then-university student named Linus Torvalds. Linus had a "state-of-the-art" 386 box at home and decided to write an alternative to the 286-based Minix system (a small unix-like implementation primarily used in operating systems classes), to take advantage of the extra instruction set available on the then-new chip, and began to write a small bare-bones kernel.

Eventually he announced his little project in the USENET group comp.os.minix, asking for interested parties to take a look and perhaps contribute to the project. The results have been phenomenal!

The interesting thing about Linux is, it is completely free! Linus decided to adopt the GNU Copyleft license of the Free Software Foundation, which means that the code is protected by a copyright -- but protected in that it must always be available to others.

Free means free -- you can get it for free, use it for free, and you are even free to sell it for a profit (this isn't as strange as it sounds; several organizations, including Red Hat, have packaged up the standard Linux kernel, a collection of GNU utilities, and put their own "flavour" of included applications, and sell them as distributions. Some common and popular distributions are

news:comp.os.minix

Slackware, Red Hat, SuSe, and Debian)! The great thing is, you have access to source code which means you can customize the operating systems to your own needs, not those of the "target market" of most commercial vendors.

Linux can and should be considered a full-blown implementation of unix. However, it can not be called "Unix"; not because of incompatibilities or lack of functionality, but because the word "Unix" is a registered trademark owned by AT&T, and the use of the word is only allowable by license agreement.

Linux is every bit as supported, as reliable, and as viable as any other operating system solution (well, in my opinion, quite a bit more so!). However, due to its origin, the philosophy behind it, and the lack of a multi-million dollar marketing campaign promoting it, there are lot of myths about it. People have a lot to learn about this wonderful OS!

A Few good reasons to use Linux

There are no royalty or licensing fees for using Linux, and the source code can be modified to fit your needs. The results can be sold for profit, but original authors retain copyright and you must provide the source to your modifications.

Because it comes with source code to the kernel, it is quite portable. Linux runs on more CPUs and platforms than any other computer operating system. The recent direction of the software and hardware industry is to push consumers to purchase faster computers with more system memory and hard drive storage. Linux systems are not affected by those industries orientation because of it capacity to run on any kind of computers, even aging x486-based computers with limited amounts of RAM.

Linux is a true multi-tasking operating system similar to its brother UNIX. It uses sophisticated, state-of-the-art memory management to control all system processes. That means that if a program crashes you can kill it and continue working with confidence.

Another benefit is that Linux is practically immunized against all kinds of viruses that we find in other operating systems. To date we have found only two viruses that were effective on Linux systems.

Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/1

System Administration in UNIXIn the context of the OS service provisioning, system administration plays a pivotal role.This is particularly the case when a system is accessed by multiple users. The primarytask of a system administrator is to ensure that the following happens:a. The top management is assured of efficiency in utilization of the system'sresources.b. The general user community gets the services which they are seeking.

In other words, system administrators ensure that there is very little to complain about thesystem's performance or service availability.In Linux environment with single user PC usage, the user also doubles up as a systemadministrator. Much of what we discuss in Unix context applies to Linux as well.In all Unix flavours there is a notion of a superuser privilege. Most major administrativetasks require that the system administrator operates in the superuser mode with rootprivileges. These tasks include starting up and shutting down a system, opening anaccount for a new user and giving him a proper working set-up. Administration tasks alsoinvolve installation of new software, distributing user disk space, taking regular back-ups,keeping system logs, ensuring secure operations and providing network services and webaccess.We shall begin this module by enlisting the tasks in system administration and offeringexposition on most of these tasks as the chapter develops.Unix Administration TasksMost users are primarily interested in just running a set of basic applications for theirprofessional needs. Often they cannot afford to keep track of new software releases andpatches that get announced. Also, rarely they can install these themselves. In addition,these are non-trivial tasks and can only be done with superuser privileges.Users share resources like disk space, etc. So there has to be some allocation policy of thedisk space. A system administrator needs to implement such a policy. Systemadministration also helps in setting up user's working environments.On the other hand, the management is usually keen to ensure that the resources are usedproperly and efficiently. They seek to monitor the usage and keep an account of systemusage. In fact, the system usage pattern is often analysed to help determine the efficacy ofOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/2usage. Clearly, managements' main concerns include performance and utilisation ofresources to ensure that operations of the organisation do not suffer.At this juncture it may be worth our while to list major tasks which are performed bysystem administrators. We should note that most of the tasks require that the systemadministrator operates in superuser mode with root privileges.Administration Tasks ListThis is not an exhaustive list, yet it represents most of the tasks which systemadministrators perform:1. System startup and shutdown: In the Section 19.2, we shall see the basic stepsrequired to start and to stop operations in a Unix operational environment.2. Opening and closing user accounts: In Unix an administrator is both a user and asuper-user. Usually, an administrator has to switch to the super-user mode withroot privileges to open or close user accounts. In Section 19.3, we shall discusssome of the nuances involved in this activity.3. Helping users to set up their working environment: Unix allows any user tocustomize his working environment. This is usually achieved by using .rc files.Many users need help with an initial set-up of their .rc files. Later, a user maymodify his .rc files to suit his requirements. In Section 19.4, we shall see most ofthe useful .rc files and the interpretations for various settings in these files.4. Maintaining user services: Users require services for printing, mail Web accessand chat. We shall deal with mail and chat in Section 19.4 where we discuss .rc

files and with print services in Section 19.5 where we discuss device managementand services. These services include spooling of print jobs, provisioning of printquota, etc.5. Allocating disk space and re-allocating quotas when the needs grow: Usuallythere would be a default allocation. However, in some cases it may be imperativeto enhance the allocation. We shall deal with the device oriented services andmanagement issues in Section 19.5.6. Installing and maintaining software: This may require installing software patchesfrom time to time. Most OSs are released with some bugs still present. Often withusage these bugs are identified and patches released. Also, one may have somesoftware installed which satisfies a few of the specialized needs of the userOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/3community. As a convention this is installed in the directory /usr/local/bin. Thelocal is an indicator of the local (and therefore a non-standard) nature of software.We shall not discuss the software installation as much of it is learned fromexperienced system administrators by assisting them in the task.7. Installing new devices and upgrading the configuration: As a demand on a systemgrows, additional devices may need to be installed. The system administrator willhave to edit configuration files to identify these devices. Some related issues shallbe covered in section 19.5 later in this chapter.8. Provisioning the mail and internet services: Users connected to any host shall seekMail and internet Web access. In addition, almost every machine shall be aresource within a local area network. So for resource too the machine shall havean IP address. In most cases it would be accessible from other machine as well.We shall show the use .mailrc files in this context later in Section 19.4.9. Ensuring security of the system: The internet makes the task of systemadministration both interesting and challenging. The administrators need to keep acheck on spoofing and misuse. We have discussed security in some detail in themodule on OS and Security.10. Maintaining system logs and profiling the users: A system administrator isrequired to often determine the usage of resources. This is achieved by analysingsystem logs. The system logs also help to profile the users. In fact, user profilinghelps in identifying security breaches as was explained in the module entitled OSand Security.11. System accounting: This is usually of interest to the management. Also, it helpssystem administrators to tune up an operating system to meet the userrequirements. This also involves maintaining and analysing logs of the systemoperation.12. Reconfiguring the kernel whenever required: Sometimes when new patches areinstalled or a new release of the OS is received, then it is imperative to compilethe kernel. Linux users often need to do this as new releases and extensionsbecome available.Let us begin our discussions with the initiation of the operations and shutdownprocedures.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/4

Starting and Shutting DownFirst we shall examine what exactly happens when the system is powered on. Later, weshall examine the shutdown procedure for Unix systems. Unix systems, on beingpowered on, usually require that a choice be made to operate either in single or inmultiple-user mode. Most systems operate in multi-user mode. However, systemadministrators use single-user mode when they have some serious reconfiguration orinstallation task to perform. Family of Unix systems emanating from System V usuallyoperate with a run level. The single-user mode is identified with run level s, otherwisethere are levels from 0 to 6. The run level 3 is the most common for multi-user mode ofoperation.On being powered on, Unix usually initiates the following sequence of tasks:1. The Unix performs a sequence of self-tests to determine if there are any hardwareproblems.2. The Unix kernel gets loaded from a root device.3. The kernel runs and initializes itself.4. The kernel starts the init process. All subsequent processes are spawned from initprocess.5. The init checks out the file system using fsck.6. The init process executes a system boot script.7. The init process spawns a process to check all the terminals from which thesystem may be accessed. This is done by checking the terminals defined under/etc/ttytab or a corresponding file. For each terminal a getty process is launched.This reconciles communication characteristics like baud rate and type for eachterminal.8. The getty process initiates a login process to enable a prospective login from aterminal.During the startup we notice that fsck checks out the integrity of the file system. In casethe fsck throws up messages of some problems, the system administrator has to workaround to ensure that there is a working configuration made available to the users. It willsuffice here to mention that one may monitor disk usage and reconcile the disk integrity.The starting up of systems is a routine activity. The most important thing to note is thaton booting, or following a startup, all the temporary files under tmp directory are cleanedOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/5up. Also, zombies are cleaned up. System administrators resort to booting when there area number of zombies and often a considerable disk space is blocked in the tmp directory.We next examine the shutdown. Most Unix systems require invoking the shutdownutility. The shutdown utility offers options to either halt immediately, or shutdown after apre-assigned period. Usually system administrators choose to shutdown with a preassignedperiod. Such a shutdown results in sending a message to all the terminals thatthe system shall be going down after a certain interval of time, say 5 minutes. Thiscautions all the users and gives them enough time to close their files and terminate theiractive processes. Yet another shutdown option is to reboot with obvious implications.The most commonly used shutdown command is as follows:shutdown -h time [message]Here the time is the period and message is optional, but often it is intended to adviseusers to take precautions to terminate their activity gracefully. This mode also prepares to

turn power off after a proper shutdown. There are other options like k, r, n etc. Thereaders are encouraged to find details about these in Unix man pages. For now, we shallmove on to discuss the user accounts management and run command files.Managing User AccountsWhen a new person joins an organisation he is usually given an account by the systemadministrator. This is the login account of the user. Now a days almost all Unix systemssupport an admin tool which seeks the following information from the systemadministrator to open a new account:1. Username: This serves as the login name for the user.2. Password: Usually a system administrator gives a simple password. The users areadvised to later select a password which they feel comfortable using. User'spassword appears in the shadow files in encrypted forms. Usually, the /etc/passwdfile contains the information required by the login program to authenticate thelogin name and to initiate appropriate shell as shown in the description below:bhatt:x:1007:1::/export/home/bhatt:/usr/local/bin/bashdamu:x:1001:10::/export/home/damu:/usr/local/bin/bashEach line above contains information about one user. The first field is the name ofthe user; the next a dummy indicator of password, which is in another file, ashadow file. Password programs use a trap-door algorithm for encryption.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/63. Home directory: Every new user has a home directory defined for him. This isthe default login directory. Usually it is defined in the run command files.4. Working set-up: The system administrators prepare .login and .profile files to helpusers to obtain an initial set-up for login. The administrator may prepare .cshrc,.xinitrc .mailrc .ircrc files. In Section 19.4 we shall later see how these files maybe helpful in customizing a user's working environment. A natural point ofcuriosity would be: what happens when users log out? Unix systems receivesignals when users log out. Recall, in Section 19.2 we mentioned that a user logsin under a login process initiated by getty process. Process getty identifies theterminal being used. So when a user logs out, the getty process which was runningto communicate with that terminal is first killed. A new getty process is nowlaunched to enable yet another prospective login from that terminal.The working set-up is completely determined by the startup files. These arebasically .rc (run command) files. These files help to customize the user's workingenvironment. For instance, a user's .cshrc file shall have a path variable whichdefines the access to various Unix built-in shell commands, utilities, libraries etc.In fact, many other shell environmental variables like HOME, SHELL, MAIL, TZ(the time zone) are set up automatically. In addition, the .rc files define the accessto network services or some need-based access to certain licensed software ordatabases as well. To that extent the .rc files help to customize the user's workingenvironment.We shall discuss the role of run command files later in Section 19.4.5. Group-id: The user login name is the user-id. Under Unix the access privileges aredetermined by the group a user belongs to. So a user is assigned a group-id. It ispossible to obtain the id information by using an id command as shown below:[bhatt@iiitbsun OS]$id

uid=1007(bhatt) gid=1(other)[bhatt@iiitbsun OS]$6. Disc quota: Usually a certain amount of disk space is allocated by default. Incases where the situation so warrants, a user may seek additional disk space. Auser may interrogate the disk space available at any time by using the dfcommand. Its usage is shown below:Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/7df [options] [name] : to know the free disk space.where name refers to a mounted file system, local or remote. We may specifydirectory if we need to know the information about that directory. The followingoptions may help with additional information:-l : for local file system-t : reports total no. of allocated blocks and i-nodes on the device.The Unix command du reports the number of disk blocks occupied by a file. Itsusage is shown below:du [options] [name]... where name is a directory or a fileAbove name by default refers to the current directory. The following options mayhelp with additional information:-a : produce output line for each file-s : report only the total usage for each name that is a directory i.e. notindividual files.-r : produce messages for files that cannot be read or opened7. Network services: Usually a user shall get a mail account. We will discuss therole of .mailrc file in this context in section 19.4. The user gets an access to Webservices too.8. Default terminal settings: Usually vt100 is the default terminal setting. One canattempt alternate terminal settings using tset, stty, tput, tabs with the controlsequences defined in terminfo termcap with details recorded in /etc/ttytype or/etc/tty files and in shell variable TERM. Many of these details are discussed inSection 19.5.1 which specifically deals with terminal settings. The reader isencouraged to look up that section for details.Once an account has been opened the user may do the following:1. Change the pass-word for access to one of his liking.2. Customize many of the run command files to suit his needs.Closing a user account: Here again the password file plays a role. Recall in section 19.1we saw that /etc/password file has all the information about the users' home directory,password, shell, user and group-id, etc. When a user's account is to be deleted, all of thisinformation needs to be erased. System administrators login as root and delete the userentry from the password file to delete the account.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/8The .rc FilesUsually system administration offers a set of start-up run command files to a new user.These are files that appear as .rc files. These may be .profile, .login, .cshrc, .bashrc.xinitrc, .mailrc .ircrc, etc. The choice depends upon the nature of the login shell. Typicalallocations may be as follows:

0 Bourne or Korn shell: .profile1 C-Shell: .login, .cshrc2 BASH: .bashrci3 TCSH: .tcshrcBASH is referred as Bourne-again shell. TCSH is an advanced C-Shell with manyshortcuts like pressing a tab may complete a partial string to the extent it can be coveredunambiguously. For us it is important to understand what is it that these files facilitate.Role of .login and .profile files: The basic role of these files is to set up the environmentfor a user. These may include the following set-ups.• Set up the terminal characteristics: Usually, the set up may include terminal type,and character settings for the prompt, erase, etc.• Set up editors: It may set up a default editor or some specific editor like emacs.• Set up protection mode: This file may set up umask, which stands for the usermask. umask determines access right to files.• Set up environment variables: This file may set up the path variable. The pathvariable defines the sequence in which directories are searched for locating thecommands and utilities of the operating system.• Set up some customization variables: Usually, these help to limit things likeselecting icons for mail or core dump size up to a maximum value. It may be usedfor setting up the limit on the scope of the command history, or some otherpreferences.A typical .login file may have the following entries:# A typical .login fileumask 022setenv PATH /usr/ucb:/usr/bin:/usr/sbin:/usr/local/binsetenv PRINTER labprinterOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/9setenv EDITOR vibiff yset prompt='hostname'=>The meanings of the lines above should be obvious from the explanation we advancedearlier. Next we describe .cshrc files and the readers should note the commonalitiesbetween these definitions of initialisation files.The .cshrc file: The C-shell makes a few features available over the Bourne shell. Forinstance, it is common to define aliases in .cshrc files for very frequently used commandslike gh for ghostview and c for clear. Below we give some typical entries for .cshrc file inaddition to the many we saw in the .login file in this section:if (! $?TERM) setenv TERM unknownif ("TERM" == "unknown" || "$TERM" == "network") then echo -n 'TERM?[vt100]: ';set ttype=($<)if (ttype == "") set ttype="vt100"if (ttype == "pc") then set ttype="vt100"endifsetenv TERM $ttypeendif

alias cl clearalias gh ghostviewset history = 50set nobeepNote that the above, in the first few lines in the script, system identifies the nature ofterminal and sets it to operate as vt100. It is highly recommended that the reader shouldexamine and walk-through the initialization scripts which the system administrationprovides. Also, a customization of these files entails that as a user we must look up thesefiles and modify them to suit our needs.There are two more files of interest. One corresponds to regulating the mail and the otherwhich controls the screen display. These are respectively initialized through .mailrc and.xinitrc. We discussed the latter in the chapter on X Windows. We shall discuss thesettings in .mailrc file in the context of the mail system.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/10The mail system: .mailrc file : From the viewpoint of the user's host machine, the mailprogram truly acts as the main anchor for our internet-based communication. The Unixsendmail program together with the uu class of programs form the very basis of the mailunder Unix. Essentially, the mail system has the following characteristics:1. The mail system is a Store and forward system.2. Mail is picked up from the mail server periodically. The mail daemon, picks upthe mail running as a background process.3. Mail is sent by sendmail program under Unix.4. The uu class of programs like uucp or Unix-to-Unix copy have provided the basisfor developing the mail tools. In fact, the file attachments facility is an example ofit.On a Unix system it is possible to invoke the mail program from an auto-login or .cshrcprogram.Every Unix user has a mailbox entry in the /usr/spool/mail directory. Each person's mailbox is named after his own username. In Table 19.1 we briefly review some very usefulmail commands and the wild card used with these commands.We next give some very useful commands which help users to manage their mailsefficiently:Various command options for mail.d:r : delete all read messages.d:usenet : delete all messages with usenet in bodyp:r : print all read messages.p:bhatt : print all from user ``bhatt''.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/11During the time a user is composing a mail, the mail system tools usually offer facility toescape to a shell. This can be very useful when large files need to be edited along side themail being sent. These use ~ commands with the interpretations shown below:~! escape to shell,~d include dead.letter~h edit header fieldThe mail system provides for command line interface to facilitate mail operations using

some of the following commands. For instance, every user has a default mail box calledmbox. If one wishes to give a different name to the mailbox, he may choose a new namefor it. Other facilities allow a mail to be composed with, or without, a subject or to see theprogress of the mail as it gets processed. We show some of these options and their usagewith mail command below.mail -s greetings [email protected]: option is used to send a mail with subject.-v: option is for the verbose option, it shows mails' progress-f mailbox: option allows user to name a new mail boxmail -f newm: where newm may be the new mail box option whicha user may opt for in place of mbox (default option).Next we describe some of the options that often appear inside .mailrc user files.Generally, with these options we may have aliases (nick-names) in place of the full mailaddress. One may also set or unset some flags as shown in the example below:unset askccset verboseset appendOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/12Various options for .mailrc file.In Table 19.2, we offer a brief explanation of the options which may be set initially in.mailrc files.In addition, in using the mail system the following may be the additional facilities whichcould be utilized:1. To subscribe to [email protected], the body of the message shouldcontain “subscribe", the group to subscribe to and the subscribers' e-mail addressas shown in the following example.subscribe allmusic [email protected]. To unsubscribe use logout allmusic. In addition to the above there are vacationprograms which send mails automatically when the receiver is on vacation.Mails may also be encrypted. For instance, one may use a pretty good privacy(PGP) for encrypting mails.Facilitating chat with .ircrc file: System administrators may prepare terminals and offerInter Relay Chat or IRC facility as well. IRC enables real-time conversation with one ormore persons who may be scattered anywhere globally. IRC is a multi-user system. Touse IRC's, Unix-based IRC versions, one may have to set the terminal emulation to vt100either from the keyboard or from an auto-login file such as .login in bin/sh or .cshrc in/bin/csh.$ set TERM=vt100$ stty erase "^h"Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/13The most common way to use the IRC system is to make a telnet call to the IRC server.There are many IRC servers. Some servers require specification of a port number as inirc.ibmpcug.co.uk9999.When one first accesses the IRC server, many channels are presented. A channel may betaken as a discussion area and one may choose a channel for an online chat (like switch a

channel on TV). IRCs require setting up an .ircrc file. Below we give some sampleentries for a .ircrc file. The .ircrc files may also set internal variables./COMMENT ...../NICK <nn>/JOIN <ch>IRC commands begin with a \/" character. In Table 19.3, we give a few of the commandsfor IRC with their interpretations.Various commands with interpretation.IRCs usually support a range of channels. Listed below are a few of the channel types:Limbo or NullPublicPrivateSecretModeratedOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/14LimitedTopic limitedInvite OnlyMessage disabled.The above channel types are realized by using a mode command. The modes are set orunset as follows. The options have the interpretations shown in Table 19.4./MODE sets (with +) and unsets (with -) the mode of channel with the following options/MODE <channel> +<channel options> < parameters>/MODE <channel> -<channel options> < parameters>Various options for channels.Sourcing FilesAs we have described above, the .rc files help to provide adequate support for a variety ofservices. Suppose we are logged to a system and seek a service that requires a change inone of the .rc files. We may edit the corresponding file. However, to affect the changedbehavior we must source the file. Basically, we need to execute the source command withthe file name as argument as shown below where we source the .cshrc file:source .cshrc

Device Management and ServicesTechnically the system administrator is responsible for every device, for all of its usageand operation. In particular, the administrator looks after its installation, upgrade,configuration, scheduling, and allocating quotas to service the user community. We shall,however, restrict ourselves to the following three services:1. Terminal-based services, discussed in Section 19.5.12. Printer services, discussed in Section 19.5.23. Disc space and file services, discussed in Section 19.5.3.We shall begin with the terminal settings and related issues.

The Terminal SettingsOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/15

In the context of terminal settings the following three things are important:1. Unix recognizes terminals as special files.2. Terminals operate on serial lines. Unix has a way to deal with files that areessentially using serial communication lines.3. The terminals have a variety of settings available. This is so even while theprotocols of communication for all of them are similar.From the point of terminal services provisioning and system configuration, systemadministration must bear the above three factors in mind. Unix maintains all terminalrelated information in tty files in /etc/dev directory. These files are special files whichadhere to the protocols of communication with serial lines. This includes those terminalsthat use modems for communication. Some systems may have a special file for consolelike /etc/dev/console which can be monitored for messages as explained in the chapter onX-Windows. Depending upon the terminal type a serial line control protocol is usedwhich can interrogate or activate appropriate pins on the hardware interface plug.The following brief session shows how a terminal may be identified on a host:login: bhattPassword:Last login: Tue Nov 5 00:25:21 from 203.197.175.174[bhatt@iiitbsun bhatt]$hostnameiiitbsun[bhatt@iiitbsun bhatt]$tty/dev/pts/1[bhatt@iiitbsun bhatt]$termcap and terminfo files: The termcap and terminfo files in the directory /etc or in/usr/share/lib/terminfo provide the terminal database, information and programs for usein the Unix environment. The database includes programs that may have been compiledto elicit services from a specific terminal which may be installed. The programs thatcontrol the usage of a specific terminal are identified in the environment variable TERMas shown in the example below:[bhatt@localhost DFT02]$ echo $TERMxterm[bhatt@localhost DFT02]$Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/16Options under stty.There are specific commands like tic, short for terminal information compilation. Also,there are programs that convert termcap to terminfo whenever required. For detaileddiscussions on terminal characteristics and how to exploit various features the reader mayrefer to [2]. We shall, however, elaborate on two specific commands here.These are the tset and stty commands.1. tset Command: The tset command is used to initialize a terminal. Usually, thecommand sets up initial settings for characters like erase, kill, etc. Below we showhow under C-Shell one may use the tset command:$setenv TERM `tset - Q -m ":?vt100"Sometimes one may prepare a temporary file and source it.2. stty command: We briefly encountered the stty command in Section 19.2. Herewe shall elaborate on stty command in the context of options and the values which

may be availed by using the stty command. In Table 19.5 we list a few of theoptions with their corresponding values.There are many other options. In Table 19.5 we have a sample of those that areavailable. Try the command stty -a to see the options for your terminal. Belowis shown the setting on my terminal:[bhatt@localhost DFT02]$ stty -aspeed 38400 baud; rows 24; columns 80; line = 0;intr = ^C; quit = ^\; erase = ^?; kill = Û; eof = ^D; eol = M-^?; eol2 = M-^?;start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V;flush = Ô; min = 1; time = 0;-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff-iuclc ixany imaxbelOperating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/17opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprtechoctl echoke[bhatt@localhost DFT02]$Lastly, we discuss how to attach a new terminal. Basically we need to connect a terminaland then we set-up the entries in termcap and/or in terminfo and configuration files.Sometimes one may have to look at the /etc/inittab or /etc/ttydefs as well. It helps toreboot the system on some occasions to ensure proper initialization following a set-upattempt.Printer ServicesUsers obtain print services through a printer daemon. The system arranges to offer printservices by spooling print jobs in a spooling directory. It also has a mechanism to servicethe print requests from the spooling directory. In addition, system administrators need tobe familiar with commands which help in monitoring the printer usage. We shall beginwith a description of the printcap file.The printcap file: Unix systems have their print services offered using a spooling system.The spooling system recognizes print devices that are identified in /etc/printcap file. Theprintcap file serves not only as a database, but also as a configuration file. Below we seethe printcap file on my machine:# /etc/printcap## DO NOT EDIT! MANUAL CHANGES WILL BE LOST!# This file is autogenerated by printconf-backend during lpd init.## Hand edited changes can be put in /etc/printcap.local, and will be included.iiitb:\:sh:\:ml=0:\:mx=0:\:sd=/var/spool/lpd/iiitb:\:lp=|/usr/share/printconf/jetdirectprint:\:lpd_bounce=true:\

Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/18:if=/usr/share/printconf/mf_wrapper:The printcap fileis a read-only fileexcept that it canbe edited bysuperuser ROOT.The entries in printcap files can be explained using Table 19.6. With the file descriptionand the table we can see that the spooling directory for our printer, with printer name iiitbis at /var/spool. Also note we have no limit on file size which can be printed.The printcap file: printer characteristics.

Printer spooling directory: As we explained earlier, print requests get spooled first.Subsequently, the printer daemon lpd honours the print request to print. To achieve this,one may employ a two layered design. Viewing it bottom up, at the bottom layermaintain a separate spooling directory for each of the printers. So, when we attach a newprinter, we must create a new spooling directory for it. At the top level, we have aspooling process which receives each print request and finally spools it for printer(s).Note that the owner of the spool process is a group daemon.Printer monitoring commands: The printer commands help to monitor both the healthof the services as also the work in progress. In table 19.7 we elaborate on the commandsand their interpretations.The printer commands.To add a printer one may use a lpadmin tool. Some of the system administration practicesare best learned by assisting experienced system administrators rarely can be taughtthrough a textbook.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/19Disk space allocation and management

In this section we shall discuss how does a system administrator manage the disk space.We will also like the reader to refer to Section 2.7.1 where we stated that at the time offormatting, partitions of the disk get defined. The partitions may be physical or logical. Incase of a physical partition we have the file system resident within one disk drive. In caseof logical partition, the file system may extend over several drives. In either of thesecases the following issues are at stake:1. Disk file system: In Chapter 2 we indicated that system files are resident in theroot file system. Similarly, the user information is maintained in home file systemcreated by the administrator. Usually, a physical disk drive may have one or morefile systems resident on it. As an example, consider the mapping shown in Figure19.1. We notice that there are three physical drives with mapping or root andThe names of file systems are shown in bold letters.

Mapping file systems on physical drives.

other file systems. Note that the disk drive with the root file system co-locates thevar file system on the same drive. Also, the file system home extends over twodrives. This is possible by appropriate assignment of the disk partitions to variousfile systems. Of course, system programmers follow some method in bothpartitioning and allocating the partitions. Recall that each file system maintainssome data about each of the files within it.System administrators have to reallocate the file systems when new disks becomeavailable, or when some disk suffers damage to sectors or tracks which may nolonger be available.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/202. Mounting and unmounting: The file systems keep the files in a directorystructure which is essentially a tree. So a new file system can be created byspecifying the point of mount in the directory tree. A typical mount instruction hasthe following format.mount a-block-special-file point-of-mountCorresponding to a mount instruction, there is also an instruction to unmount. InUnix it is umount with the same format as mount.In Unix every time we have a new disk added, it is mounted at a suitable point ofmount in the directory tree. In that case the mount instruction is used exactly asexplained. Of course, a disk is assumed to be formatted.3. Disk quota: Disk quota can be allocated by reconfiguring the file system usuallylocated at /etc/fstab. To extend the allocation quota in a file system we first haveto modify the corresponding entry in the /etc/fstab file. The system administrationcan set hard or soft limits of user quota. If a hard limit has been set, then the usersimply cannot exceed the allocated space. However, if a soft limit is set, then theuser is cautioned when he approaches the soft limit. Usually, it is expected thatthe user will resort to purging files no longer in use. Else he may seek additionaldisk space. Some systems have quota set at the group level. It may also bepossible to set quota for individual users. Both these situations require executingan edit quota instruction with user name or group name as the argument. Theformat of edquota instruction is shown below.edquota user-name4. Integrity of file systems: Due to the dynamics of temporary allocations andmoving files around, the integrity of a file system may get compromised. Thefollowing are some of the ways the integrity is lost:• Lost files. This may happen because a user ahs opened the same file frommultiple windows and edited them.• A block may be marked free but may be in use.• A block may be marked in use but may be free.• The link counts may not be correct.• The data in the file system table and actual files may be different.Operating Systems/System Administration in UNIX Lecture NotesPCP Bhatt/IISc, Bangalore M19/V1/June 04/21The integrity of the file system is checked out by using a fsck instruction. Theargument to the command is the file system which we need to check as shownbelow.

fsck file-system-to-be-checkedOn rebooting the system these checks are mandatory and routinely performed.Consequently, the consistency of the file system is immediately restored onrebooting.5. Access control: As explained earlier in this chapter, when an account is opened, auser is allocated a group. The group determines the access. It is also possible tooffer an initial set-up that will allow access to special (licensed) software likematlab suite of software.6. Periodic back-up: Every good administrator follows a regular back-up procedureso that in case of a severe breakdown, at least a stable previous state can beachieved.After-WordIn this moduler we have listed many tasks which system administrators are required toperform. However, as we remarked earlier, the best lessons in system administration arelearned under the tutelage of a very experienced system administrator. There is no

substitute to the “hands-on" learning.

SPSA

Documents

india kerry

operating

bhattiiitbsun

relocatable

level programming

relocatable

debug monitor

unix operating