Top Banner
Workbook 8, and 9 Pace Center for Business and Technology 1
121

Workbook 8, and 9 Pace Center for Business and Technology 1.

Jan 11, 2016

Download

Documents

Brett Lucas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Workbook 8, and 9 Pace Center for Business and Technology 1.

Workbook 8, and 9

Pace Center for Business and Technology

1

Page 2: Workbook 8, and 9 Pace Center for Business and Technology 1.

String Processing Tools

Key Concepts •The wc command counts the number of characters, words, and lines in a file. When applied to structured data, the wc command can become a versatile counting tool. •The cat command has options that allow representation of nonprinting characters such as NEWLINE. •The head and tail commands have options that allow you to print only a certain number of lines or a certain number of bytes (one byte usually correlates to one character) from a file.

2

Page 3: Workbook 8, and 9 Pace Center for Business and Technology 1.

Revisiting cat, head, and tail Revisiting cat We have been using the cat command to simply display the contents of files. Usually, the cat command generates a faithful copy of its input, without performing any edits or conversions. When called with one of the following command line switches, however, the cat command will indicate the presence tabs, line feeds, and other control sequences, using the following conventions. Using the -A command line switch, the whitespace structure of the file becomes evident, as tabs are replaced with ^I, and line feeds are decorated with $. E.g. cat -A /etc/hosts

3

Page 4: Workbook 8, and 9 Pace Center for Business and Technology 1.

For example, the following file contains a list of four musicians.

Linux (and Unix) text files generally adhere to a convention that the last character of the file must be a line feed for the last line of text. Following the cat of the file musicians.mac, which does not contain any conventional Linux line feed characters, the bash prompt is not displayed in its usual location.

Revisiting head and tail

4

Page 5: Workbook 8, and 9 Pace Center for Business and Technology 1.

Revisiting head and tail

5

Page 6: Workbook 8, and 9 Pace Center for Business and Technology 1.

When used without any command line switches, wc will report on the number of characters, lines, and words. Command line switches can be combined to return any combination of character count, line count or word count.

The wc (Word Count) Command

6

Page 7: Workbook 8, and 9 Pace Center for Business and Technology 1.

Text files are composed using an alphabet of characters. Some characters are visible, such as numbers and letters. Some characters are used for horizontal distance, such as spaces and TAB characters. Some characters are used for vertical movement, such as carriage returns and line feeds. A line in a text file is a series of any character other than a NEWLINE (line feed) character and then a NEWLINE character. Additional lines in the file immediately follow the first line. While a computer represents characters as numbers, the exact value used for each symbol varies depending on which alphabet has been chosen. The most common alphabet for English speakers is ASCII, also called “Latin-1”. Different human languages are represented by different computer encoding rules, so the exact numeric value for a given character depends on the human language being recorded.

How To Recognize A Real Character

7

Page 8: Workbook 8, and 9 Pace Center for Business and Technology 1.

So, What Is A Word? A word is a group of printing characters, such as letters and digits, surrounded by white space, such as space characters or horizontal TAB characters. Notice that our definition of a word does not include any notion of “meaning”. Only the form of the word is important, not its semantics. As far as Linux is concerned, a line such as:

8

Page 9: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 2. Finding Text: grep

Key Concepts •grep is a command that prints lines that match a specified text string or pattern. •grep is commonly used as a filter to reduce output to only desired items. •grep -r will recursively grep files underneath a given directory. •grep -v prints lines that do NOT match a specified text string or pattern. •Many other command line switches allow users to specify grep's output format.

9

Page 10: Workbook 8, and 9 Pace Center for Business and Technology 1.

Searching Text File Contents using grep In an earlier Lesson, we saw how the wc program can be used to count the characters, words and lines in text files. In this Lesson we introduce the grep program, a handy tool for searching text file contents for specific words or character sequences. The name grep stands for general regular expression parser. What, you may well ask, is a regular expression and why on earth should I want to parse one? We will provide a more formal definition of regular expressions in a later Lesson, but for now it is enough to know that a regular expression is simply a way of describing a pattern, or template, to match some sequence of characters. A simple regular expression would be “Hello”, which matches exactly five characters: “H”, “e”, two consecutive “l” characters, and a final “o”. More powerful search patterns are possible and we shall examine them in the next section. The figure below gives the general form of the grep command line:

10

Page 11: Workbook 8, and 9 Pace Center for Business and Technology 1.

The following table summarizes some of grep's more commonly used command line switches. Consult the grep(1) man page (or invoke grep --help) for more.

Searching Text File Contents using grep

11

Page 12: Workbook 8, and 9 Pace Center for Business and Technology 1.

Show All Occurrences of a String in a File Under Linux, there are often several ways of accomplishing the same task. For example, to see if a file contains the word “even”, you could just visually scan the file:

Reading the file, we see that the file does indeed contain the letters “even”. Using this method on a large file suffers because we could easily miss one word in a file of several thousand, or even several hundred thousand, words. We can use the grep tool to search through the file for us in an automatic search:

Here we searched for a word using its exact spelling. Instead of just a literal string, the pattern argument can also be a general template for matching more complicated character sequences; we shall explore that in a later Lesson.

12

Page 13: Workbook 8, and 9 Pace Center for Business and Technology 1.

Searching in Several Files at Once An easy way to search several files is just to name them on the grep command line:

Perhaps we are more interested in just discovering which file mentions the word “nine” than actually seeing the line itself. Adding the -l switch to the grep line does just that:

13

Page 14: Workbook 8, and 9 Pace Center for Business and Technology 1.

Searching Directories RecursivelyGrep can also search all the files in a whole directory tree with a single command. This can be handy when working a large number of files. The easiest way to understand this is to see it in action. In the directory /etc/sysconfig are text files that contain much of the configuration information about a Linux system. The Linux name for the first Ethernet network device on a system is “eth0”, so you can find which file contains the configuration for eth0 by letting the grep -r command do the searching for you [11]:

14

Page 15: Workbook 8, and 9 Pace Center for Business and Technology 1.

Searching Directories RecursivelyEvery file in /etc/sysconfig that mentions eth0 is shown in the results. We can further limit the files listed to only those referring to an actual device by filtering the grep -r output through a grep DEVICE:

This shows a common use of grep as a filter to simplify the outputs of other commands. If only the names of the files were of interest, the output can be simplified with the -l command line switch.

15

Page 16: Workbook 8, and 9 Pace Center for Business and Technology 1.

Inverting grep By default, grep shows only the lines matching the search pattern. Usually, this is what you want, but sometimes you are interested in the lines that do not match the pattern. In these instances, the -v command line switch inverts grep's operation.

16

Page 17: Workbook 8, and 9 Pace Center for Business and Technology 1.

Getting Line NumbersOften you may be searching a large file that has many occurrences of the pattern. Grep will list each line containing one or more matches, but how is one to locate those lines in the original file? Using the grep -n command will also list the line number of each matching line. The file /usr/share/dict/words contains a list of common dictionary words. Identify which line contains the word “dictionary”:

You might also want to combine the -n switch with the -r switch when searching all the files below a directory:

17

Page 18: Workbook 8, and 9 Pace Center for Business and Technology 1.

Limiting Matching to Whole Words Remember the file containing our nursery rhyme earlier?

Suppose we wanted to retrieve all lines containing the word “at”. If we try the command:

Do you see what happened? We matched the “at” string, whether it was an isolated word or part of a larger word. The grep command provides the -w switch to imply that the specified pattern should only match entire words.

The -w switch considers a sequence of letters, numbers, and underscore characters, surrounded by anything else, to be a word.

18

Page 19: Workbook 8, and 9 Pace Center for Business and Technology 1.

Ignoring Case The string “Bob” has quite a meaning quite different from the string “bob”.

However, sometimes we want to find either one, regardless of whether the word is capitalized or not. The grep -i command solves just this problem.

19

Page 20: Workbook 8, and 9 Pace Center for Business and Technology 1.

ExamplesFinding Simple Character Strings

Verify that your computer has the system account “lp”, used for the line printer tools. Hint: the file /etc/passwd contains one line for each user account on the system.

20

Page 21: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 3. Introduction to Regular Expressions

Key Concepts •Regular expressions are a standard Unix syntax for specifying text patterns. •Regular expressions are understood by many commands, including grep, sed, vi, and many scripting languages. •Within regular expressions, . and [] are used to match characters. •Within regular expressions, +, *, and ?specify a number of consecutive occurrences. •Within regular expressions, ^ and $ specify the beginning and end of a line. •Within regular expressions, (, ), and | specify alternative groups. •The regex(7) man page provides complete details.

21

Page 22: Workbook 8, and 9 Pace Center for Business and Technology 1.

Introducing Regular Expressions In the previous chapter you saw grep used to match either a whole word or part of a word. This by its self is very powerful, especially in conjunction with arguments like -i and -v, but it is not appropriate for all search scenarios. Here are some examples of searches that the grep usage you've learned so far would not be able to do: First, suppose you had a file that looked like this:

22

Page 23: Workbook 8, and 9 Pace Center for Business and Technology 1.

Introducing Regular Expressions What if you wanted to pull out just the names of the people in people_and_pets.txt? A command like grep -w Name: would match the 'Name:' line for each person, but also the 'Name:' line for each person's pet. How could we match only the 'Name:' lines for people? Well, notice that the lines for pets' names are all indented, meaning that those lines begin with whitespace characters instead of text. Thus, we could achieve our goal if we had a way to say "Show me all lines that begin with 'Name:'". Another example: Suppose you and a friend both witnessed a hit-and-run car accident. You both got a look at the fleeing car's license plate and yet each of you recalls a slightly different number. You read the license number as "4I35VBB" but your friend read it as "413SV88". It seems that what you read as an 'I' in the second character, your friend read as a '1'. Similar differences appear in your interpretations of other parts of the license like '5' vs 'S' and 'BB' vs '88'. The police, having taken both of your statements, now need to narrow down the suspects by querying their database of license plates for plates that might match what you saw.

23

Page 24: Workbook 8, and 9 Pace Center for Business and Technology 1.

Introducing Regular Expressions One solution might be to do separate queries for "4I35VBB" and "413SV88" but doing so assumes that one of you is exactly right. What if the perpetrator's license number was actually "4135VB8"? In other words, what if you were right about some of the characters in question but your friend was right about others? It would be more effective if the police could query for a pattern that effectively said: "Show me all license numbers that begin with a '4', followed by an 'I' or a '1', followed by a '3', followed by a '5' or an 'S', followed by a 'V', followed by two characters that are each either a 'B' or an '8'". Query scenarios like these can be solved using regular expressions. While computer scientists sometimes use the term "regular expression" (or "regex" for short) to describe any method of describing complex patterns, in Linux and many programming languages the term refers to a very specific set of special characters used for solving problems like the above. Regular expressions are supported by a large number of tools including grep, vi, find and sed.

24

Page 25: Workbook 8, and 9 Pace Center for Business and Technology 1.

Introducing Regular Expressions To introduce the usage of regular expressions, lets look at some solutions to two problems introduced earlier. Don't worry if these seem a bit complicated, the remainder of the unit will start from scratch and cover regular expressions in great detail. A regex that could solve the first problem, where we wanted to say "Show me all lines that begin with 'Name:'" might look like this:

...that's it! Regular expressions are all about the use of special characters, called metacharacters to represent advanced query parameters. The carat ("^"), as shown here, means "Lines that begin with...". Note, by the way, that the regular expression was put in single-quotes. This is a good habit to get into early on as it prevents bash from interpreting special characters that were meant for grep.

25

Page 26: Workbook 8, and 9 Pace Center for Business and Technology 1.

Ok, so what about the second problem? That one involved a much more complicated query: "Show me all license numbers that begin with a '4', followed by an 'I' or a '1', followed by a '3', followed by a '5' or an 'S', followed by a 'V', followed by two characters that are each either a 'B' or an '8'". This could be represented by a regular expression that looks like this:

Wow, that's pretty short considering how long it took to write out what we were looking for! There are only two types of regex metacharacters used here: square braces ('[]') and curly braces ('{}'). When two or more characters are shown within square braces it means "any one of these". So '[B8]' near the end of the expression means "'B' or '8'". When a number is shown within curly braces it means "this many of the preceding character". Thus, '[B8]{2}' means "two characters that are each either a 'B' or an '8'". Pretty powerful stuff! Now that you've gotten a taste of what regular expressions are and how they can be used, let's start from scratch and cover them in depth.

Introducing Regular Expressions

26

Page 27: Workbook 8, and 9 Pace Center for Business and Technology 1.

Regular Expressions, Extended Regular Expressions, and the grep Command

As the Unix implementation of regular expression syntax has evolved, new metacharacters have been introduced. In order to preserve backward compatibility, commands usually choose to implement regular expressions, or extended regular expressions. In order to not become bogged down with the differences, this Lesson will introduce the extended syntax, summarizing differences at the end of the discussion. One of the most common uses for regular expressions is specifying search patterns for the grep command. As was mentioned in the previous Lesson, there are three versions of the grep command. Reiterating, the three differ in how they interpret regular expressions.

27

Page 28: Workbook 8, and 9 Pace Center for Business and Technology 1.

fgrep The fgrep command is designed to be a "fast" grep. The fgrep command does not support regular expressions, but instead interprets every character in the specified search pattern literally. grep The grep command interprets each patterns using the original, basic regular expression syntax. egrep The egrep command interprets each patterns using extended regular expression syntax. Because we are not yet making a distinction between the basic and extended regular expression syntax, the egrep command should be used whenever the search pattern contains regular expressions.

Regular Expressions, Extended Regular Expressions, and the grep Command

28

Page 29: Workbook 8, and 9 Pace Center for Business and Technology 1.

Anatomy of a Regular Expression

In our discussion of the grep program family, we were introduced to the idea of using a pattern to identify the file content of interest. Our examples were carefully constructed so that the pattern contained exactly the text for which we were searching. We were careful to use only literal characters in our regular expressions; a literal character matches only itself. So when we used “hello” as the regular expression, we were using a five-character regular expression composed only of literal characters. While this let us concentrate on learning how to operate the grep program, it didn't allow us to get a full appreciation of the power of regular expressions. Before we see regular expressions in use, we shall first see how they are constructed.

29

Page 30: Workbook 8, and 9 Pace Center for Business and Technology 1.

A regular expression is a sequence of: Literal Characters Literal characters match only themselves. Examples of literals are letters, digits and most special characters (see below for the exceptions). Wildcards Wildcard characters match any character. Within a regular expression, a period (“.”) matches any character, be it a space, a letter, a digit, punctuation, anything. Modifiers A modifier alters the meaning of the immediately preceding pattern character. For example, the expression “ab*c” matches the strings “ac”, “abc”, “abbc”, “abbbc”, and so on, because the asterisk (“*”) is a modifier that means “any number of (including zero)”. Thus, our pattern means to match any sequence of characters consisting of one “a”, a (possibly empty) series of “b” characters, and a final “c” character. Anchors Anchors establish the context for the pattern, such as "the beginning of a line", or "the end of a word". For example, the expression “cat” would match any occurrence of the three letters, while “^cat” would only match lines that begin “cat”.

Anatomy of a Regular Expression

30

Page 31: Workbook 8, and 9 Pace Center for Business and Technology 1.

Taking Literals Literally Literals are straightforward because each literal character in a regular expressions matches one, and only one, copy of itself in the searched text. Uppercase characters are distinct from lowercase characters, so that “A” does not match “a”. WildcardsThe "dot" wildcard The character “.” is used as a placeholder, to match one of any character. In the following example, the pattern matches any occurrence of the literal characters “x” and “s”, separated by exactly two other characters.

31

Page 32: Workbook 8, and 9 Pace Center for Business and Technology 1.

Bracket Expressions: Ranges of Literal Characters Normally a literal character in a regex pattern matches exactly one occurrence of itself in the searched text. Suppose we want to search for the string “hello” regardless of how it is capitalized: we want to match “Hello” and “HeLLo” as well. How might we do that? A regex feature called a bracket expression solves this problem neatly. A bracket expression is a range of literals enclosed in square brackets (“[” and “]”). For example, the regex pattern “[Hh]” is a character range that matches exactly one character: either an uppercase “H” or a lowercase “h” letter. Notice that it doesn't matter how large the set of characters within the range is, the set matches exactly one character, if it matches any at all. A bracket expression that matches the set of lowercase vowels could be written “[aeiou]” and would match exactly one vowel. In the following example, bracket expressions are used to find words from the file /usr/share/dict/words. In the first case, the first five words that contain three consecutive (lowercase) vowels are printed. In the second case, the first 5 words that contain lowercase letters in the pattern of vowel-consonant-vowel-consonant-vowel-consonant are printed.

32

Page 33: Workbook 8, and 9 Pace Center for Business and Technology 1.

If the first character of a bracket expression is a “^”, the interpretation is inverted, and the bracket expression will match any single occurrence of a character not included in the range. For example, the expression “[^aeiou]” would match any character that is not a vowel. The following example first lists words which contain three consecutive vowels, and secondly lists words which contain three consecutive consonant-vowel pairs.

Bracket Expressions: Ranges of Literal Characters

33

Page 34: Workbook 8, and 9 Pace Center for Business and Technology 1.

Range Expressions vs. Character Classes: Old School and New School

Another way to express a character range is by giving the start- and end-letters of the sequence this way: “[a-d]” would match any character from the set a, b, c or d. A typical usage of this form would be “[0-9]” to represent any single digit, or “[A-Z]” to represent all capital letters.

34

Page 35: Workbook 8, and 9 Pace Center for Business and Technology 1.

As an alternative to such quandaries, modern regular expression make use character classes. Character classes match any single character, using language specific conventions to decide if a given character is uppercase or lowercase, or if it should be considered part of the alphabet or punctuation. The following table lists some supported character classes, and the ASCII equivalent range expression, where appropriate.

Range Expressions vs. Character Classes: Old School and New School

35

Page 36: Workbook 8, and 9 Pace Center for Business and Technology 1.

Character classes avoid problems you may run into when using regular expressions on systems that use different character encoding schemes where letters are ordered differently. For example, suppose you were to run the command:

On a Red Hat Enterprise Linux system, this would match every word in the file, not just those that contain capital letters as one might assume. This is because in unicode (utf-8), the character encoding scheme that RHEL uses, characters are alphabetized case-insensitively, so that [A-Z] is equivalent to [AaBbCc...etc].

Range Expressions vs. Character Classes: Old School and New School

36

Page 37: Workbook 8, and 9 Pace Center for Business and Technology 1.

Range Expressions vs. Character Classes: Old School and New School

On older systems, though, a different character encoding scheme is used where alphabetization is done case-sensitively. On such systems [A-Z] would be equivalent to [ABC...etc]. Character classes avoid this pitfall. You can run:

on any system regardless of the encoding scheme being used and it will only match lines that contain capital letters. For more details about the predefined range expressions, consult the grep manual page. For more information on character encoding schemes under Linux, refer back to chapter 8.3. To learn about how character encoding schemes are used to support other languages in Red Hat Enterprise Linux, begin with the locale manual page.

37

Page 38: Workbook 8, and 9 Pace Center for Business and Technology 1.

Common Modifier Characters We saw a common usage of a regex modifier in our earlier example “ab*c” to match an a and c character with some number of b letters in between. The “*” character changed the interpretation of the literal b character from matching exactly one letter to matching any number of b's. Here are a list of some common modifier characters: b? The question mark (“?”) means “either one or none”: the literal character is considered to be optional in the searched text. For example, the regex pattern “ab?c” matches the strings “ac”, and “abc”, but not “abbc”. b* The asterisk (“*”) modifier means “any number of (including zero)” of the preceding literal character. The regex pattern “ab*c” matches the strings “ac”, “abc”, “abbc”, and so on.

38

Page 39: Workbook 8, and 9 Pace Center for Business and Technology 1.

b+ The plus (“+”) modifier means “one or more”, so the regex pattern “b+” matches a non-empty sequence of b's. The regex pattern “ab+c” matches the strings “abc” and “abbc”, but does not match “acb{m,n} The brace modifier is used to specify a range of between m and n occurrences of the preceding character. The regex pattern “b{2,4}” would match “abbc” and “abbbc”, and “abbbbc”, but not “abc” or “abbbbbc”. b{n} With only one integer, the brace modifier is used to specify exactly n occurrences for the preceding character.

Common Modifier Characters

39

Page 40: Workbook 8, and 9 Pace Center for Business and Technology 1.

Common Modifier Characters In the following example, egrep prints lines from /usr/share/dict/words that contain patterns which start with a (capital or lowercase) “a”, might or might not next have a (lowercase) “b”, but then definitely follow with a (lowercase) “a”.

The following example prints lines which contain patterns which start “al”, then use the “.” wildcard to specify 0 or more occurrences of any character, followed by the pattern “bra”.

40

Page 41: Workbook 8, and 9 Pace Center for Business and Technology 1.

Notice we found variations on the words algebra and calibrate. For the former, the .* expression matched “ge”, while for the latter, it matched the letter “i”. The expression “.*”, which is interpreted as "0 or more of any character", shows up often in regex patterns, acting as the "stretchable glue" between two patterns of significance. As a subtlety, we should note that the modifier characters are greedy: they always match the longest possible input string. For example, given the regex pattern:

Common Modifier Characters

41

Page 42: Workbook 8, and 9 Pace Center for Business and Technology 1.

Anchored Searches Four additional search modifier characters are available: ^foo A caret (“^”) matches the beginning of a line. Our example “^foo” matches the string “foo” only when it is at the beginning of a line foo$ A dollar sign (“$”) matches the end of a line. Our example “foo$” matches the string “foo” only at the end of a line, immediately before the newline character. \<foo\> By themselves, the less than sign (“<”) and the greater than sign (“>”) are literals. Using the backslash character to escape them transforms them into meaning “first of a word” and “end of a word”, respectively. Thus the pattern “\>cat\<” matches the word “cat” but not the word “catalog”. You will frequently see both ^ and $ used together. The regex pattern “^foo$” matches a whole line that contains only “foo” and would not match that line if it contained any spaces. The \< and \> are also usually used as pairs.

42

Page 43: Workbook 8, and 9 Pace Center for Business and Technology 1.

Anchored Searches In the following an example, the first search lists all lines that contain the letters “ion” anywhere on the line. The second search only lists lines which end in “ion”.

43

Page 44: Workbook 8, and 9 Pace Center for Business and Technology 1.

Coming to Terms with Regex Grouping The same way that you can use parenthesis to group terms within a mathematical expression, you also use parenthesis to collect regular expression pattern specifiers into groups. This lets the modifier characters “?”, “*” and “+” apply to groups of regex specifiers instead of only the immediately preceding specifier. Suppose we need a regular expression to match either “foo” or “foobar”. We could write the regex as “foo(bar)?” and get the desired results. This lets the “?” modifier apply to the whole string “bar” instead of only the preceding “r” character. Grouping regex specifiers using parenthesis becomes even more flexible when the pipe symbol (“|”) is used to separate alternative patterns. Using alternatives, we could rewrite our previous example as “(foo|foobar)”. Writing this as “foo|foobar” is simpler and works just as well, because just like mathematics, regex specifiers have precedence. While you are learning, always enclose your groups in parenthesis.

44

Page 45: Workbook 8, and 9 Pace Center for Business and Technology 1.

Coming to Terms with Regex Grouping In the following example, the first search prints all lines from the file /usr/share/dict/words which contain four consecutive vowels (compare the syntax to that used when first introducing range expressions, above). The second search finds words that contain a double “o” or a double “e”, followed (somewhere) by a double “e”.

45

Page 46: Workbook 8, and 9 Pace Center for Business and Technology 1.

Escaping Meta-CharactersSometimes you need to match a character that would ordinarily be interpreted as a regular expression wildcard or modifier character. To temporarily disable the special meaning of these characters, simply escape them using the backslash (“\”) character. For example, the regex pattern “cat.” would match the letters “cat” followed by any character: “cats” or “catchup”. To match only the letters “cat.” at the end of a sentence, use the regex pattern “cat\.” to disable interpreting the period as a wildcard character. Note one distracting exception to this rule. When the backslash character precedes a “<” or “>” character, it enables the special interpretation (anchoring the beginning or ending of a word) instead of disabling the special interpretation. Shudder. It even gets worse - see the footnote at the bottom of the following table.

46

Page 47: Workbook 8, and 9 Pace Center for Business and Technology 1.

Summary of Linux Regular Expression Syntax The following table summarizes regular expression syntax, and identifies which components are found in basic regular expression syntax, and which are found only in the extended regular expression syntax.

47

Page 48: Workbook 8, and 9 Pace Center for Business and Technology 1.

Summary of Linux Regular Expression Syntax The following table summarizes regular expression syntax, and identifies which components are found in basic regular expression syntax, and which are found only in the extended regular expression syntax.

48

Page 49: Workbook 8, and 9 Pace Center for Business and Technology 1.

Regular Expressions are NOT File Globbing When first encountering regular expressions, students understandably confuse regular expressions with pathname expansion (file globbing). Both are used to match patterns in text. Both share similar metacharacters (“*”, “?”, “[...])”, etc.). However, they are distinctly different. The following table compares and contrasts regular expressions and file globbing.

49

Page 50: Workbook 8, and 9 Pace Center for Business and Technology 1.

Regular Expressions are NOT File Globbing In the following example, the first argument is a regular expression, specifying text which starts with an “l” and ends “.conf”, while the second argument is a file glob which specifies all files in the /etc directory whose filename starts with “l” and ends “.conf”.

Take a close look at the second line of output. Why was it matched by the specified regular expression? Why does the line containing the text “krb5.conf” match the expression? The “l” is found way back in the word “default”! In a similar vain, when specifying regular expressions on the bash command line, care must be taken to quote or escape the regex meta-characters, lest they be expanded away by the bash shell with unexpected results. In all of the examples found in this discussion, the first argument to the egrep command is protected with single quotes for just this reason.

50

Page 51: Workbook 8, and 9 Pace Center for Business and Technology 1.

Where to Find More Information About Regular Expressions

We have barely scratched the surface of the usefulness of regular expressions. The explanation we have provided will be adequate for your daily needs, but even so, regular expressions offer much more power, making even complicated text searches simple to perform. For more online information about regular expressions, you should check: The regex(7) manual page. The grep(1) manual page.

51

Page 52: Workbook 8, and 9 Pace Center for Business and Technology 1.

ExamplesRegular Expression Modifiers

52

Page 53: Workbook 8, and 9 Pace Center for Business and Technology 1.

Workbook 9Managing Processes

Pace Center for Business and Technology

53

Page 54: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 1. An Introduction to Processes Key Concepts •A process is an instance of a running executable, identified by a process id (pid). •Because Linux implements virtual memory, every process possesses its own distinct memory context. •A process has a uid and a collection of gid as credentials. •A process has a filesystem context, including a cwd, a umask, a root directory, and a collection of open files. •A process has a scheduling context, including a niceness value. •A process has a collection of environment variables. •The ps command can be used to examine all currently running processes. •The top command can be used to monitor all running processes.

54

Page 55: Workbook 8, and 9 Pace Center for Business and Technology 1.

Processes are How Things Get DoneAlmost anything that happens in a Linux system, happens as a process. If you are viewing this text in a web browser, that browser is running as a process. If you are typing at a bash shell's command line, that shell is running as a process. If you are using the chmod command to change a file's permissions, the chmod command operates as a separate process. Processes are how things get done, and the primary responsibility of the Linux kernel is to provide a place for processes to do their stuff without stepping on each other's toes. Processes are an instance of an executing program. In other operating systems, programs are often large, elaborate, graphical applications that take a noticeably long time to start up. In the Linux (and Unix) world, these types of programs exist as well, but so do a whole class of programs which usually have no counterpart in other operating systems. These programs are designed to be quick to start, specialized in function, and play well with others. On a Linux system, processes running these programs are constantly popping into and out of existence.

55

Page 56: Workbook 8, and 9 Pace Center for Business and Technology 1.

Processes are How Things Get DoneFor example, consider the user maxwell performing the following command line.

In the split second that the command line took to execute, no less four than processes (ps, grep, bash, and date) were started, did their thing, and exited.

56

Page 57: Workbook 8, and 9 Pace Center for Business and Technology 1.

What is a Process?

By this point, you could well be tired of hearing the answer: a process in an instance of a running program. Here, however, we provide a more detailed list of the components that constitute a process. Execution Context Every process exists (at least to some extent) within the physical memory of the machine. Because Linux (and Unix) is designed to be a multiuser environment, the memory allocated to a process is protected, and no other process can access it. In its memory, a process loads a copy of its executable instructions, and stores any other dynamic information it is managing. A process also carries parameters associated with how often it gets the opportunity to access the CPU, such as its execution state and its niceness value (more on these soon).

57

Page 58: Workbook 8, and 9 Pace Center for Business and Technology 1.

What is a Process?I/O Context Every process interacts to some extent with the filesystem in order to read or write information that exists before or will exist after the lifespan of the process. Elements of a process's input/output context include the following. Open File Descriptors Almost every process is reading information from or writing information to external sources, usually both. In Linux, open file descriptors act as sources or sinks of information. Processes read information from or write information to file descriptors, which may be connected to regular files, device nodes, network sockets, or even each other as pipes (allowing interprocess communication). Memory Mapped Files Memory mapped files are files whose contents have been mapped directly into the process's memory. Rather than reading or writing to a file descriptor, the process just accesses the appropriate memory address. Memory maps are most often used to load a process's executable code, but may also be used for other types of non-sequential access to data.

58

Page 59: Workbook 8, and 9 Pace Center for Business and Technology 1.

What is a Process?Filesystem Context We have encountered several pieces of information related to the filesystem that processes maintain, such as the process's current working directory (for translating relative file references) and the process's umask (for setting permissions on newly created files). [13] Environment VariablesEvery process maintains its own list of name-value pairs, referred to as environment variables, or collectively as the process's environment. Processes generally inherit their environment on startup, and may refer to it for information such as the user's preferred language or favorite editor. Heritage Information Every process is identified by a PID, or process id, which it is assigned when it is created. In a later Lesson, we will discover that every process has a clearly defined parent and possibly well defined children. A process's own identity, the identity of its children, and to some extent the identity of its siblings are maintained by the process.

59

Page 60: Workbook 8, and 9 Pace Center for Business and Technology 1.

What is a Process?Credentials Every process runs under the context of a given user (or, more exactly, a given user id), and under the context of a collection of group id's (generally, all of the groups that the user belongs to). These credentials limit what resources a process can access, such as which files it can open or with which other processes it is allowed to communicate. Resource Statistics and Limits Every process also records statistics to track the extent to which system resources have been utilized, such as its memory size, its number of open files, its amount of CPU time, and others. The amount of many of these resources that a process is allowed to use can also be limited, a concept called resource limits.

60

Page 61: Workbook 8, and 9 Pace Center for Business and Technology 1.

Viewing Processes with the ps Command We have already encountered the ps command many times. Now, we will attempt to familiarize ourselves with a broader selection of the many command line switches associated with it. A quick ps --help will display a summary of over 50 different switches for customizing the ps command's behavior. To complicate matters, different versions of Unix have developed their own versions of the ps command, which do not use the same command line switch conventions. The Linux version of the ps command tries to be as accommodating as possible to people from different Unix backgrounds, and often there are multiple switches for any give option, some of which start with a conventional leading hyphen (“-”), and some of which do not.

61

Page 62: Workbook 8, and 9 Pace Center for Business and Technology 1.

Viewing Processes with the ps Command Process Selection By default, the ps command lists all processes started from a user's terminal. While reasonable when users connected to Unix boxes using serial line terminals, this behavior seems a bit minimalist when every terminal window within an X graphical environment is treated as a separate terminal. The following command line switches can be used to expand (or reduce) the processes which the ps command lists.

62

Page 63: Workbook 8, and 9 Pace Center for Business and Technology 1.

Output Selection As implied by the initial paragraphs of this Lesson, there are many parameters associated with processes, too many to display in a standard terminal width of 80 columns. The following table lists common command line switches used to select what aspects of a process are listed.

63

Page 64: Workbook 8, and 9 Pace Center for Business and Technology 1.

Output Selection Additionally, the following switches can be used to modify how the selected information is displayed.

64

Page 65: Workbook 8, and 9 Pace Center for Business and Technology 1.

Oddities of the ps Command The ps command, probably more so than any other command in Linux, has oddities associated with its command line switches. In practice, users tend to experiment until they find combinations that work for them, and then stick to them. For example, the author prefers ps aux for a general purpose listing of all processes, while many people prefer ps -ef. The above tables should provide a reasonable "working set" for the novice. The command line switches tend to fall into two categories, those with the traditional leading hyphen ("Unix98" style options), and those without ("BSD" style options). Often, a given functionality will be represented by one of each. When grouping multiple single letter switches, only switches of the same style can be grouped. For example, ps axf is the same as ps a x f, not ps a x -f.

65

Page 66: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the top CommandThe ps command displays statistics for specified processes at the instant that the command is run, providing a snapshot of an instance in time. In contrast, the top command is useful for monitoring the general state of affairs of processes on the machine. The top command is intended to be run from within a terminal. It will replace the command line with a table of currently running processes, which updates every few seconds. The following demonstrates a user's screen after running the top command.

66

Page 67: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the top CommandWhile the command is running, the keyboard is "live". In other words, the top command will respond to single key presses without waiting for a return key. The following table lists some of the more commonly used keys.

67

Page 68: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the top CommandThe last two command, which either kill or renice a process, use concepts that we will cover in more detail in a later Lesson. Although most often run without command line configuration, top does support the following command line switches.

68

Page 69: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the gnome-system-monitor Application

If running an X server, the GNOME desktop environment provides an application similar in function to top, with the benefits (and drawbacks) of a graphical application. The application can be started from the command line as gnome-system-monitor, or by selecting the System : Administration : System Monitor menu item.

69

Page 70: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the gnome-system-monitor Application

Like the top command, the System Monitor displays a list of processes running on the local machine, refreshing the list every few seconds. In its default configuration, the System Monitor provides a much simpler interface: it lists only the processes owned by the user who started the application, and reduces the number of columns to just the process's command, owner, Process ID, and simple measures of the process's Memory and CPU utilization. Processes may be sorted by any one of these fields by simply clicking on the column's title.

70

Page 71: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the gnome-system-monitor Application

When right-clicking on a process, a pop-up menu allows the user to perform many of the actions that top allowed, such as renicing or killing a process, though again with a simpler (and not as flexible) interface.

71

Page 72: Workbook 8, and 9 Pace Center for Business and Technology 1.

Monitoring Processes with the gnome-system-monitor Application

The System Monitor may be configured by opening the Edit : Preferences menu selection. Within the Preferences dialog, the user may set the update interval (in seconds), and configure many more fields to be displayed.

72

Page 73: Workbook 8, and 9 Pace Center for Business and Technology 1.

Locating processes with the pgrep Command.

Often, users are trying to locate information about processes identified by the command they are running, or the user who is running them. One technique is to list all processes, and use the grep command to reduce the information. In the following, maxwell first looks for all instances of the sshd daemon, and then for all processes owned by the user maxwell.

1.While maxwell can find the information he needs, there are some unpleasant issues. 2.The approach is not exacting. Notice that, in the second search, a su process showed up, not because it was owned by maxwell, but because the word maxwell was one of its arguments. 3.Similarly, the grep command itself usually shows up in the output. 4.The compound command can be awkward to type. 73

Page 74: Workbook 8, and 9 Pace Center for Business and Technology 1.

Locating processes with the pgrep Command.

In order to address these issues, the pgrep command was created. Named pgrep for obvious reasons, the command allows users to quickly list processes by command name, user, terminal, or group. pgrep [SWITCHES] [PATTERN]Its optional argument, if supplied, is interpreted as an extended regular expression pattern to be matched against command names. The following command line switches may also be used to qualify the search.

74

Page 75: Workbook 8, and 9 Pace Center for Business and Technology 1.

Locating processes with the pgrep Command.

In addition, the following command line switches can be use to qualify the output formatting of the command.

For a complete list of switches, consult the pgrep(1) man page. As a quick example, maxwell will repeat his two previous process listings, using the pgrep command.

75

Page 76: Workbook 8, and 9 Pace Center for Business and Technology 1.

ExamplesChapter 1. An Introduction to Processes

Viewing All Processes with the "User Oriented" Format In the following transcript, maxwell uses the ps -e u command to list all processes (-e) with the "user oriented" format (u).

The "user oriented" view displays the user who is running the process, the process id, and a rough estimate of the amount of CPU and memory the process is consuming, as well as the state of the process. (Process states will be discussed in the next Lesson).

76

Page 77: Workbook 8, and 9 Pace Center for Business and Technology 1.

QuestionsChapter 1. An Introduction to Processes

1, 2, and 3

77

Page 78: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 2 Process States

Key Concepts•In Linux, the first process, /sbin/init, is started by the kernel on bootup. All other processes are the result of a parent process duplicating itself, or forking. •A process begins executing a new command through a process called execing. •Often, new commands are run by a process (often a shell) first forking, and then execing. This mechanism is referred to as the fork and exec mechanism. •Processes can always be found in one of five well defined states: runnable, voluntarily sleeping, involuntarily sleeping, stopped, or zombie. •Process ancestry can be viewed with the pstree command. •When a process dies, it is the responsibility of the process's parent to collect it's return code and resource usage information. •When a parent dies before it's children, the orphaned children are inherited by the first process (usually /sbin/init).

78

Page 79: Workbook 8, and 9 Pace Center for Business and Technology 1.

A Process's Life Cycle

How Processes are Started In Linux (and Unix), unlike many other operating systems, process creation and command execution are two separate concepts. Though usually a new process is created so that it can run a specified command (such as the bash shell creating a process to run the chmod command), processes can be created without running a new command, and new commands can be executed without creating a new process. Creating a New Process (Forking) New processes are created through a technique called forking. When a process forks, it creates a duplicate of itself. Immediately after a fork, the newly created process (the child) is an almost exact duplicate of the original process (the parent). The child inherits an identical copy of the original process's memory, any open files of the parent, and identical copies of any parameters of the parent, such as the current working directory or umask. About the only difference between the parent and the child is the child's heritage information (the child has a different process ID and a different parent process ID, for starters), and (for the programmers in the audience) the return value of the fork() system call. As a quick aside for any programmers in the audience, a fork is usually implemented using a structure similar to the following.

79

Page 80: Workbook 8, and 9 Pace Center for Business and Technology 1.

A Process's Life Cycle

As a quick aside for any programmers in the audience, a fork is usually implemented using a structure similar to the following.

When a process wants to create a new process, it calls the fork() system call (with no arguments). Though only one process enters the fork() call, two processes return from in. For the newly created process (the child), the return value is 0. For the original process (the parent), the return value is the process ID of the child. By branching on this value, the child may now go off to do whatever it was started to do (which often involves exec()ing, see next), and the parent can go on to do its own thing.

80

Page 81: Workbook 8, and 9 Pace Center for Business and Technology 1.

A Process's Life Cycle

Executing a New Command (Exec-ing) New commands are run through a technique called execing (short for executing). When execing a new command, the current process wipes and releases most of its resources, and loads a new set of instructions from the command specified in the filesystem. Execution starts with the entry point of the new program. After execing, the new command is still the same process. It has the same process ID, and many of the same parameters (such as its resource utilization, umask, current working directory, and others). It merely forgets its former command, and adopts the new one. Again for any programmers, execs are performed through one of several variants of the execve() system call, such as the execl() library call.

The process enters the the execl(...) call, specifying the new command to run. If all goes well, the execl(...) call never returns. Instead, execution picks up at the entry point (i.e., main()) of the new program. If for some reason execl(...) does return, it must be an error (such as not being able to locate the command's executable in the filesystem).

81

Page 82: Workbook 8, and 9 Pace Center for Business and Technology 1.

A Process's Life Cycle

Combining the Two: Fork and Exec Some programs may fork without execing. Examples include networking daemons, who fork a new child to handle a specific client connection, while the parent goes back to listen for new clients. Other programs might exec without forking. Examples include the login command, which becomes the user's login shell after successfully confirming a user's password. Most often, and for shell's in particular, however, forking and execing go hand and hand. When running a command, the bash shell first forks a new bash shell. The child then execs the appropriate command, while the parent waits for the child to die, and then issues another prompt.

82

Page 83: Workbook 8, and 9 Pace Center for Business and Technology 1.

The Lineage of Processes (and the pstree Command)

Upon booting the system, one of the responsibilities of the Linux kernel is to start the first process (usually /sbin/init). All other processes are started because an already existing process forked. [2] Because every process except the first is created by forking, there exists a well defined lineage of parent child relationships among the processes. The first process started by the kernel starts off the family tree, which can be examined with the pstree command.

83

Page 84: Workbook 8, and 9 Pace Center for Business and Technology 1.

How a Process Dies

When a process dies, it either dies normally by electing to exit, or abnormally as the result of receiving a signal. We here discuss a normally exiting process, postponing a discussion of signals until a later Lesson. We have mentioned previously that processes leave behind a status code (also called return value) when they die, in the form of an integer. (Recall the bash shell, which uses the $? variable to store the return value of the previously run command.) When a process exits, all of its resources are freed, except the return code (and some resource utilization accounting information). It is the responsibility of the process's parent to collect this information, and free up the last remaining resources of the dead child. For example, when the bash shell forks and execs the chmod command, it is the parent bash shell's responsibility to collect the return value from the exited chmod command. Orphans If it is a parent's responsibility to clean up after their children, what happens if the parent dies before the child does? The child becomes an orphan. One of the special responsibilities of the first process started by the kernel is to "adopt" any orphans. (Notice that in the output of the pstree command, the first process has a disproportionately large number of children. Most of these were adopted as the orphans of other processes).

84

Page 85: Workbook 8, and 9 Pace Center for Business and Technology 1.

How a Process Dies

Zombies In between the time when a process exits, freeing most of its resources, and the time when its parent collects its return value, freeing the rest of its resources, the child process is in a special state referred to as a Zombie. Every process passes through a transient zombie state. Usually, users need to be looking at just the right time (with the ps command, for example) to witness a zombie. They show up in the list of processes, but take up no memory, no CPU time, or any other system resources. They are just the shadow of a former process, waiting for their parent to come and finish them off. Negligent Parents and Long Lived Zombies Occasionally, parent processes can be negligent. They start child processes, but then never go back to clean up after them. When this happens (usually because of a programmer's error), the child can exit, enter the zombie state, and stay there. This is usually the case when users witness zombie processes using, for example, the ps command. Getting rid of zombies is perhaps the most misunderstood basic Linux (and Unix) concept. Many people will say that there is no way to get rid of them, except by rebooting the machine. Using the clues discussed in this section, can you figure out how to get rid of long lived zombies? You get rid of zombies by getting rid of the negligent parent. When the parent dies (or is killed), the now orphaned zombie gets adopted by the first process, which is almost always /sbin/init. /sbin/init is a very diligent parent, who always cleans up after its children (including adopted orphans).

85

Page 86: Workbook 8, and 9 Pace Center for Business and Technology 1.

The 5 Process States

The previous section discussed how processes are started, and how they die. While processes are alive they are always in one of five process states, which effect how and when they are allowed to have access to the CPU. The following lists each of the five states, along with the conventional letter that is used by the ps, top, and other commands to identify a process's current state. Runnable (R) Processes in the Runnable state are processes that, if given the opportunity to access the CPU, would take it. More formally, this is know as the Running state, but because only one process may be executing on the CPU at any given time, only one of these processes will actually be "running" at any given instance. Because runnable processes are switched in and out of the CPU so quickly, however, the Linux system gives the appearance that all of the processes are running simultaneously.

86

Page 87: Workbook 8, and 9 Pace Center for Business and Technology 1.

The 5 Process States

Voluntary (Interruptible) Sleep (S) As the name implies, a process which is in a voluntary sleep elected to be there. Usually, this is a process that has nothing to do until something interesting happens. A classic example is a networking daemon, such as the httpd process that implements a web server. In between requests from a client (web browser), the server has nothing to do, and elects to go to sleep. Another example would be the top command, which lists processes every five seconds. While it is waiting for five seconds to pass, it drops itself into a voluntary sleep. When something that the process in interested in happens (such as a web client makes a request, or a five second timer expires), the sleeping process is kicked back into the Runnable state. Involuntary (Non-interruptible) Sleep (D) Occasionally, two processes try to access the same system resource at the same time. For example, one process attempts to read from a block on a disk while that block is being written to because of another process. In these situations, the kernel forces the process into an involuntary sleep. The process did not elect to sleep, it would prefer to be runnable so it can get things done. When the resource is freed, the kernel will put the process back into the runnable state. Although processes are constantly dropping into and out of involuntary sleeps, they usually do not stay there long. As a result, users do not usually witness processes in an involuntary sleep except on busy systems.

87

Page 88: Workbook 8, and 9 Pace Center for Business and Technology 1.

The 5 Process States

Stopped (Suspended) Processes (T) Occasionally, users decide to suspend processes. Suspended processes will not perform any actions until they are restarted by the user. In the bash shell, the CTRL+Z key sequence can be used to suspend a process. In programming, debuggers often suspend the programs the are debugging when certain events happen (such as breakpoints occur). Zombie Processes (Z) As mentioned above, every dieing process goes through a transient zombie state. Occasionally, however, some get stuck there. Zombie processes have finished executing, and have freed all of their memory and almost all of their resources. Because they are not consuming any resources, they are little more than an annoyance that can show up in process listings.

88

Page 89: Workbook 8, and 9 Pace Center for Business and Technology 1.

Viewing Process States

When viewing the output of commands such as ps and top, process states are usually listed under the heading STAT. The process is identified by one of the following letters. •Runnable - R •Sleeping - S •Stopped - T •Uninterruptible sleep - D •Zombie - Z

89

Page 90: Workbook 8, and 9 Pace Center for Business and Technology 1.

ExamplesChapter 2. Process States

Identifying Process States

90

Page 91: Workbook 8, and 9 Pace Center for Business and Technology 1.

QuestionsChapter 2. Process States

1, 2, and 4

91

Page 92: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 4. Sending Signals

Key Concepts•Signals are a low level form of inter-process communication, which arise from a variety of sources, including the kernel, the terminal, and other processes. •Signals are distinguished by signal numbers, which have conventional symbolic names and uses. The symbolic names for signal numbers can be listed with the kill -l command. •The kill command sends signals to other processes. •Upon receiving a signal, a process may either ignore it, react in a kernel specified default manner, or implement a custom signal handler. •Conventionally, signal number 15 (SIGTERM) is used to request the termination of a process. •Signal number 9 (SIGKILL) terminates a process, and cannot be overridden. •The pkill and killall commands can be used to deliver signals to processes specified by command name, or the user who owns them. •Other utilities, such as top and the GNOME System Monitor can be used to deliver signals as well.

92

Page 93: Workbook 8, and 9 Pace Center for Business and Technology 1.

Signals

Linux (and Unix) uses signals to notify processes of abnormal events, and as a primitive mechanism of interprocess communication. Signals are sometimes referred to as software interrupts, in that they can interrupt the normal flow of execution of a process. The kernel uses signals to notify processes of abnormal behavior, such as if the process tries to divide a number by zero, or tries to access memory that does not belong to it. Processes can also send signals to other processes. For example, a bash shell could send a signal to an xclock process. The receiving process knows very little about the origins of the signal. It doesn't know if the signal originated from the kernel, or from another process; all it knows is that it received a signal.

93

Page 94: Workbook 8, and 9 Pace Center for Business and Technology 1.

Signals

There are, however, different flavors of signals. The different flavors have symbolic names, but are also identified by integers. The various integers, and the symbolic name they are mapped to, can be listed using the kill -l command, or examined in the signal(7) man page.

Linux, like most versions of Unix, implements 32 "normal" signals. In Linux, signals numbered 32 through 63 (which are not standard among the various versions of Unix) are "real time" signals, and beyond the scope of this text.

94

Page 95: Workbook 8, and 9 Pace Center for Business and Technology 1.

Why Are Signals Sent?

There are a variety of reasons why signals might be sent to a process, as illustrated by the following examples. Hardware Exceptions The process asked the hardware to perform some erroneous operation. For example, the kernel will send a process a SIGFPE (signal number 8) if it performs a divide by 0. Software Conditions Processes may need to be notified of some abnormal software condition. For example, whenever a process dies, the kernel sends a SIGCHLD (signal number 17) to the process's parent. As another example, X graphical applications receive a SIGWINCH (signal number 28) whenever their window is resized, so that they can respond to the new geometry. Terminal Interrupts Various terminal control key sequences send signals to the bash shell's foreground process. For example, CTRL+C sends a SIGINT (signal number 2), while CTRL+Z sends a SIGTSTP (signal number 20). Other Processes Processes may elect to send any signals to any other process which is owned by the same user. The kill command is designed to do just this.

95

Page 96: Workbook 8, and 9 Pace Center for Business and Technology 1.

Sending Signals: the kill Command

The kill command is used to deliver custom signals to other processes. It expects to be called with a numeric or symbolic command line switch, which specifies which signal to send, and a process ID, which specifies which process should receive it. As an example, the following commands deliver a SIGCHLD (signal number 17) to the xclock process, process ID number 8060.

When using the symbolic name to specify a signal, the “SIG” prefix (which all signals share) can either be included or omitted.

96

Page 97: Workbook 8, and 9 Pace Center for Business and Technology 1.

Sending Signals: the kill Command

The kill command is used to deliver custom signals to other processes. It expects to be called with a numeric or symbolic command line switch, which specifies which signal to send, and a process ID, which specifies which process should receive it. As an example, the following commands deliver a SIGCHLD (signal number 17) to the xclock process, process ID number 8060.

When using the symbolic name to specify a signal, the “SIG” prefix (which all signals share) can either be included or omitted.

97

Page 98: Workbook 8, and 9 Pace Center for Business and Technology 1.

Receiving Signals When a process receives a signal, it may take one of the following three actions. Implement a Kernel Default Signal Handler For each type of the signal, there is a default response which is implemented by the kernel. Each signal is mapped to one of the following behaviors. •Terminate: The receiving process is killed. •Ignore: The receiving process ignores the signal •Core: The receiving process terminates, but first dumps an image of its memory into a file named core in the process's current working directory. The core file can be used by developers to help debug the program. This response if affectionately referred to as "puking" by many in the Unix community. •Stop: Stop (suspend) the process. The signal(7) man page documents which behavior is mapped to which signal. Choose to Ignore the Signal Programmers may elect for their application to simply ignore specified signals. Choose to Implement a Custom Signal Handler Programmers may elect to implement their own behavior when a specified signal is received. The response of the program is completely determined by the programmer. Unless a program's documentation says otherwise, you can usually assume that a process will respond with the kernel implemented default behavior. Any other response should be documented.

98

Page 99: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using Signals to Terminate Processes Of the 32 signals used in Linux (and Unix), standard users in practice only (explicitly) make use of a few.

Usually, standard users are using signals to terminate a process (thus the name of the kill command). By convention, if programmers want to implement custom behavior when shutting down (such as flushing important memory buffers to disk, etc.), they implement a custom signal handler for signal number 15 to perform the action. Signal number 9 is handled specially by the kernel, and cannot be overridden by a custom signal handler or ignored. It is reserved as a last resort, kernel level technique for killing a process.

99

Page 100: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using Signals to Terminate Processes As an example, einstein will start a cat command that would in principle run forever. He then tracks down the process ID of the command, and terminates it with a SIGTERM.

SIGTERM (signal number 15) is the default signal for the kill command, so einstein could have used kill 8375 to the same effect. In the following, einstein repeats the sequence, this time sending a SIGKILL.

100

Page 101: Workbook 8, and 9 Pace Center for Business and Technology 1.

Alternatives to the kill Command Using signals to control processes is such a common occurrence, alternatives to using the kill command abound. The following sections mention a few. The pkill Command In each of the previous examples, einstein needs to determine the process ID of a process before sending a signal to it with the kill command. The pkill command can be used to send signals to processes selected by more general means. The pkill command expects the following syntax. pkill [-signal] [SWITCHES] [PATTERN]The first token optionally specifies the signal number to send (by default, signal number 15). PATTERN is an extended regular expression that will be matched against command names. The following table lists commonly used command line switches. Processes that meet all of the specified criteria will be sent the specified signal.

101

Page 102: Workbook 8, and 9 Pace Center for Business and Technology 1.

Alternatives to the kill Command Conveniently, the pkill command omits itself and the shell which started it when killing all processes owned by a particular user or terminal. Consider the following example.

Notice that, although the bash shell qualifies as a process owned by the user maxwell, it survived the slaughter.

102

Page 103: Workbook 8, and 9 Pace Center for Business and Technology 1.

The killall Command Similar to pkill, the killall command delivers signals to processes specified by command name. The killall command supports the following command line switches.

103

Page 104: Workbook 8, and 9 Pace Center for Business and Technology 1.

The System Monitor The System Monitor GNOME application, introduced in a previous Lesson, can also be used to deliver signals to processes. By right clicking on a process, a pop-up menu allows the user to select End Process, which has the effect of delivering a SIGTERM to the process. What do you think the Kill Process menu selection does? The Kill Process menu selection delivers a SIGKILL signal to the process.

104

Page 105: Workbook 8, and 9 Pace Center for Business and Technology 1.

The top Command The top command can , can also be used to deliver signals to processes. Using the K key, the following dialog occurs above the list of processes, allowing the user to specify which process ID should receive the signal, and which signal to deliver.

105

Page 106: Workbook 8, and 9 Pace Center for Business and Technology 1.

Online ExercisesChapter 4. Sending Signals

Lab Exercise Objective: Effectively terminate running processes. Estimated Time: 10 mins. Specification Create a short shell script called ~/bin/kill_all_cats, and make it executable. When executed, the script should kill all currently running cat processes. In a terminal, start a cat process using the following command line. Leave the process running while grading your exercise (but don't be surprised if its not running when you're done).

[student@station student]$ cat /dev/zero > /dev/null Deliverables A script shell script called ~/bin/kill_all_cats, which when executed, delivers a SIGTERM signal to all currently running instances of the cat command. An executing cat process.

Hint ~/bin/kill_all_cats = killall cat

106

Page 107: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 7. Scheduling Periodic Tasks: cron

Key Concepts•The cron facility is used to schedule regularly recurring tasks. •The crontab command provides a front end to editing crontab files. •The crontab file uses 5 fields to specify timing information. •stdout from cron jobs is mailed to the user.

107

Page 108: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 7. Scheduling Periodic Tasks: cron

Performing Periodic Tasks Often, people find that they are (or aught to be) performing tasks on a regular basis. In system administration, such tasks might include removing old, unused files from the /tmp directory, or checking to make sure a file that's collecting log messages hasn't grown to large. Other users might find they're own tasks, such as checking for large files that they aren't using anymore, or checking a website to see if anything new has been posted. The cron service allows users to configure commands to be run on a regular basis, such as every 10 minutes, once every thursday, or twice a month. Users specify what commands should be run at what times by using the crontab command to configure their "cron table". The tasks are managed by a traditional Linux (and Unix) daemon, the crond daemon.

108

Page 109: Workbook 8, and 9 Pace Center for Business and Technology 1.

Chapter 7. Scheduling Periodic Tasks: cron

The cron Service The crond daemon is the daemon that performs periodic tasks on behalf of the system or individual users. Usually, the daemon is started as the system boots, so most users can take it for granted. By listing all processes and searching for crond, you can confirm that the crond daemon is running.

If the crond daemon is not running, your system administrator would need to start the crond service as root.

109

Page 110: Workbook 8, and 9 Pace Center for Business and Technology 1.

crontab Syntax

Users specify which jobs to run, and when to run them, by configuring a file known as the "cron table", more often abbreviated "crontab". An example crontab file is listed below.

A crontab file is a line based configuration file, with each line performing one of three functions: Comments All lines who first (non-space) character is a # are considered comments, and are ignored. Environment variables All lines that have the form name = value are used to define environment variables. Cron commands Any other (non blank) line is considered a cron command, which is made up of six fields described below.

110

Page 111: Workbook 8, and 9 Pace Center for Business and Technology 1.

crontab Syntax

Cron command lines consist of six whitespace separated fields. The first 5 fields are used to specify when to run the command, and the remaining sixth field (composed of everything after the fifth field) specifies the command to run. The first five fields specify the following information:

111

Page 112: Workbook 8, and 9 Pace Center for Business and Technology 1.

crontab Syntax

Each of the first five fields must be filled with a token using the following syntax:

112

Page 113: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using the crontab Command

Users seldom manage their crontab file directly (or even know where it is stored), but instead use the crontab command to edit, list, or remove it. crontab {[-e] | [-l] | [-r]}crontab FILE Edit, list, or remove the current crontab file, or replace the current crontab file with FILE.

113

Page 114: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using the crontab Command

In the following sequence of commands, hogan will use the crontab command to manage his crontab configuration. He first lists his current crontab configuration to the screen, then he lists the current file again, storing the output into the file mycopy.

114

Page 115: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using the crontab Command

Next, hogan removes his current crontab configuration. When he next tries to list the configuration, he is informed that no current configuration exists.

In order to restore his cron configuration, hogan uses the crontab command once again, this time specifying the mycopy file as an argument. Upon listing his configuration again, he finds that his current configuration was read from the mycopy file.

115

Page 116: Workbook 8, and 9 Pace Center for Business and Technology 1.

Using the crontab Command

A little annoyingly, the banner has been duplicated in the process. Can you out why? The original banner was stored in mycopy. When mycopy was resubmitted, cron treated the original banner as a user comment, and prepended a new banner.

116

Page 117: Workbook 8, and 9 Pace Center for Business and Technology 1.

Editing crontab Files in Place

Often, users edit their crontab files in place, using crontab -e. The crontab command will open the current crontab configuration into the user's default editor. When the user has finished editing the file, and exits the editor, the modified contents of the file are installed as the new crontab configuration. The default default editor is /bin/vi, however crontab, like many other commands, examines the EDITOR environment variable. If the variable has been set, it will be used to specify which editor to open. For example, if hogan prefers to use the nano editor, he can first set up the EDITOR environment variable to /usr/bin/nano (or simply nano), and then run crontab -e.

117

Page 118: Workbook 8, and 9 Pace Center for Business and Technology 1.

Editing crontab Files in Place

If hogan wanted to use nano as his editor, he could use one of the following approaches:

or, even better, hogan could add the line "export EDITOR=nano" to his .bash_profile file, and the environment variable would be set automatically every time he logged in. In summary, there are two ways someone could go about creating or modifying their crontab configuration. Create a text file containing their desired configuration, and then install it with crontab FILENAME. Edit their configuration in place with crontab -e.

118

Page 119: Workbook 8, and 9 Pace Center for Business and Technology 1.

Where does the output go?

How does the user receive output from commands run by cron? The crond daemon will mail stdout and stderr from any commands run to the local user. Suppose ventura had set up the following cron job:

119

Page 120: Workbook 8, and 9 Pace Center for Business and Technology 1.

Where does the output go?

Once an hour, at five minutes past the hour, he could expect to receive new mail that looks like the following:

The mail message contains the output of the command in the body, and all defined environment variables in the message headers. Optionally, ventura could have set the special MAILTO environment variable to a destination email address, and mail would be sent to that address instead:

120

Page 121: Workbook 8, and 9 Pace Center for Business and Technology 1.

Environment Variables and cron

When configuring cron jobs, users should be aware of a subtle detail. When the crond daemon starts the user's command, it does not run the command from a shell, but instead forks and execs the command directly. This has an important implication: Any environment variables or aliases that are configured by the shell at startup, such as any defined in /etc/profile or ~/.bash_profile, will not be available when cron executes the command. If a user wants an environment variable to be defined, they need to explicitly define the variable in their crontab configuration.

121