Top Banner
39

Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Jan 01, 2016

Download

Documents

Marylou Horn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.
Page 2: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions

Used by several different UNIX commands, including ed, sed, awk, grep

A period ‘.’ matches any single characters .X. matches any X that is surrounded by any two

characters Caret character ^ matches the beginning of the

line ^Bridgeport matches the characters Bridgeport

only if they occur at the beginning of the line

Page 3: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

A dollar sign ‘$’ is used to match the end of the line

Bridgeport$ will match the characters Bridgeport only they are the very last characters on the line

$ matches any single character at the end of the line

To match any single character, this character should be preceded by a backslash ‘\’ to remove the special meaning

\.$ matches any line end with a period

Page 4: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

^$ matches any line that contains no characters […] is used to match any character enclosed in

[…] [tT] matches a lower or upper case t followed

immediately by the characters [A-Z] matches upper case letter [A-Za-z] matches upper or lower case letter [^A-Z] matches any character except upper case

letter [A-Za-z] matches any non alphabetic character

Page 5: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

(*) Asterisk matches zero or more characters X* matches zero, one, two, three, … capital X’s XX* matches one or more capital X’s .* matches zero or more occurrences of any

characters e.*e matches all the characters from the first e in the

line to the last one [A-Za-z] [A-Za-z] * matches any alphabetic

character followed by zero or more alphabetic character

Page 6: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

[-0-9] matches a single dash or digit character (ORDER IS IMPORTANT)

[0-9-] same as [-0-9] [^-0-9] matches any alphabetic except digits and

dash []a-z] matches a right bracket or lower case letter

(ORDER IS IMPORTANT)

Page 7: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

\{min, max\} matches a precise number of characters

min specifies the minimum number of occurrences of the preceding regular expression to be matched, and max specifies the maximum

w\{1,10\} matches from 1 to 10 consecutive w’s [a-zA-Z]\{7\} matches exactly seven alphabetic

characters

Page 8: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

X\{5,\} matches at least five consecutive X’s \(….) is used to save matched characters ^\(.\) matches the first character on the line and

store it into register one There is 1-9 registers To retrieve what is stored in any register \n is used Example: ^\(.\)\1 matches the first two characters

on a line if they are both the same characters

Page 9: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

Regular expressions (continue.)

^\(.\).*\1$ matches all lines in which the first character on the line is the same as the last. Note (.*) matches all the characters in-between

^\(…)\(…\) the first three characters on the line will be stored into register 1 and the next three characters into register 2

Page 10: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

cut

$ whobgeorge pts/16 Oct 5 15:01 (216.87.102.204)

abakshi pts/13 Oct 6 19:48 (216.87.102.220)

tphilip pts/11 Oct 2 14:10 (AC8C6085.ipt.aol.com)

$ who | cut -c1-8,18-bgeorge Oct 5 15:01 (216.87.102.204)

abakshi Oct 6 19:48 (216.87.102.220)

tphilip Oct 2 14:10 (AC8C6085.ipt.aol.com)

$

Used in extracting various fields of data from a data

file or the output of a command

Format: cut -cchars file

chars specifies what characters to extract from each line of file.

Page 11: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

cut (continue.)

Example: -c5, -c1,3,4 -c-10-15 -c5- The –d and –f options are used with cut

when you have data that is delimited by a particular character

Format: cut –ddchars –ffields file dchar: delimiters of the fields (default: tab

character) fields: fields to be extracted from file

Page 12: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

cut (continue.)

$ cat /etc/passwd

root:x:0:1:Super-User:/:/sbin/sh

daemon:x:1:1::/:

bin:x:2:2::/usr/bin:

sys:x:3:3::/:

adm:x:4:4:Admin:/var/adm:

lp:x:71:8:Line Printer Admin:/usr/spool/lp:

uucp:x:5:5:uucp Admin:/usr/lib/uucp:

listen:x:37:4:Network Admin:/usr/net/nls:

nobody:x:60001:60001:Nobody:/:

noaccess:x:60002:60002:No Access User:/:

oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh

webuser:*:102:102:Web User:/export/home/webuser:/bin/csh

abuzneid:x:103:100:Abdelshakour Abuzneid:/home/abuzneid:/sbin/csh

$

Page 13: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

cut (continue.)

$ cut -d: -f1 /etc/passwd

root

daemon

bin

sys

adm

lp

uucp

nuucp

listen

nobody

oracle

webuser

abuzneid

$

Page 14: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

cut (continue.)

$ cat phonebook

Edward 336-145

Alice 334-121

Sony 332-336

Robert 326-056

$ cut -f1 phonebook

Edward

Alice

Sony

Robert

$

Page 15: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

paste

Format: paste files tab character is a default delimiter

Page 16: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

paste (continue.)

Example:

$ cat students

Sue

Vara

Elvis

Luis

Eliza

$ cat sid

578426

452869

354896

455468

335123

$ paste students sid

Sue 578426

Vara 452869

Elvis 354896

Luis 455468

Eliza 335123

$

Page 17: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

paste (continue.)

The option –s tells paste to paste together lines from the same file not from alternate files

To change the delimiter, -d option is used

Page 18: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

paste (continue.)

Examples:$ paste -d '+' students sid

Sue+578426

Vara+452869

Elvis+354896

Luis+455468

Eliza+335123

$ paste -s students

Sue Vara Elvis Luis Eliza

$ ls | paste -d ' ' -s -

addr args list mail memo name nsmail phonebook programs roster sid

students test tp twice user

$

Page 19: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sed

sed (stream editor) is a program used for editing data

Unlike ed, sed can not be used interactively Format: sed command file command: applied to each line of the specified file file: if no file is specified, then standard input is

assumed sed writes the output to the standard output s/Unix/UNIX command is applied to every line in

the file, it replaces the first Unix with UNIX

Page 20: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sed (continue.)

sed makes no changes to the original input file ‘s/Unix/UNIX/g’ command is applied to every line

in the file. It replaces every Unix with UNIX. “g” means global

With –n option, selected lines can be printed Example: sed –n ’1,2p’ file which prints the first

two lines Example: sed –n ‘/UNIX/p’ file, prints any line

containing UNIX

Page 21: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sed (continue.)

Example: sed –n ‘/1,2d/’ file, deletes lines 1 and 2

Example: sed –n’ /1’ text, prints all lines from text, showing non printing characters as \nn and tab characters as “>”

Page 22: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

tr

The tr filter is used to translate characters from standard input

Format: tr from-chars to-chars Result is written to standard output Example tr e x <file, translates every “e” in file to

“x” and prints the output to the standard output The octal representation of a character can be

given to “tr” in the format \nnn Example: tr : ‘\11’ will translate all : to tabs

Page 23: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

tr (continue.)

Character Octal value

Bell 7

Backspace 10

Tab 11

New line 12

Linefeed 12

Form feed 14

Carriage return 15

Escape 33

Page 24: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

tr (continue.)

Example: tr ‘[a-z]’’[A-Z]’ < file translate all lower case letters in file to their uppercase equivalent. The characters ranges [a-z] and [A-Z] are enclosed in quotes to keep the shell from replacing them with all files named from a through z and A through Z

To “squeeze” out multiple occurrences of characters the –s option is used

Page 25: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

tr (continue.)

Example: tr –s ’ ’ ‘ ‘ < file will squeeze multiple spaces to one space

The –d option is used to delete single characters from a stream of input

Format: tr –d from-chars Example: tr –d ‘ ‘ < file will delete all spaces from

the input stream

Page 26: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

grep

Searches one or more files for a particular characters patterns

Format: grep pattern files Example: grep path .cshrc will print every line

in .cshrc file which has the pattern ‘path’ and print it

Example: grep bin .cshrc .login .profile will print every line from any of the three files .cshrc, .login and .profile which has the pattern “bin”

Page 27: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

grep (continue.)

Example : grep * smarts will give an error because * will be substituted with all file in the correct directory

Example : grep ‘*’ smarts

*

smartsgrep

arguments

Page 28: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort

By default, sort takes each line of the specified input file and sorts it into ascending order$ cat students

Sue

Vara

Elvis

Luis

Eliza

$ sort students

Eliza

Elvis

Luis

Sue

Vara

$

Page 29: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

The –n option tells sort to eliminate duplicate lines from the output

Page 30: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

$ echo Ash >> students

$ echo Ash >> students

$ cat students

Sue

Vara

Elvis

Luis

Eliza

Ash

Ash

$ sort students

Ash

Ash

Eliza

Elvis

Luis

Sue

Vara

$

Page 31: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

The –s option reverses the order of the sort The –o option is used to direct the input from the

standard output to file sort students > sorted_students works as sort

students –o sorted_students The –o option allows to sort file and saves the output

to the same file Example:

sort students –o students correct

sort students > students incorrect

Page 32: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

• The –n option specifies the first field for sort as number and data to sorted arithmetically

Page 33: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

$ cat data

-10 11

15 2

-9 -3

2 13

20 22

3 1

$ sort data

-10 11

-9 -3

15 2

2 13

20 22

3 1

$

Page 34: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

$ sort -n data

-10 11

-9 -3

2 13

3 1

15 2

20 22

$ sort +1n data

-9 -3

3 1

15 2

-10 11

2 13

20 22

$

Page 35: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

To sort by the second field +1n should be used instead of n. +1 says to skip the first field

+5n would mean to skip the first five fields on each line and then sort the data numerically

Page 36: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

sort (continue.)

Example

$ sort -t: +2n /etc/passwd

root:x:0:1:Super-User:/:/sbin/sh

daemon:x:1:1::/:

bin:x:2:2::/usr/bin:

sys:x:3:3::/:

adm:x:4:4:Admin:/var/adm:

uucp:x:5:5:uucp Admin:/usr/lib/uucp:

nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico

listen:x:37:4:Network Admin:/usr/net/nls:

lp:x:71:8:Line Printer Admin:/usr/spool/lp:

oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh

webuser:*:102:102:Web User:/export/home/webuser:/bin/csh

y:x:60001:60001:Nobody:/:

$

Page 37: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

uniq

Used to find duplicate lines in a file Format: uniq in_file out_file uniq will copy in_file to out_file removing

any duplicate lines in the process uniq’s definition of duplicated lines are

consecutive-occurring lines that match exactly

Page 38: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

uniq (continue.)

$ cat studentsSueVaraElvisLuisElizaAshAsh

$ uniq studentsSueVaraElvisLuisElizaAsh

$

The –d option is used to list duplicate lines

Example:

Page 39: Regular expressions Used by several different UNIX commands, including ed, sed, awk, grep A period ‘.’ matches any single characters.X. matches any X.

References UNIX SHELLS BY EXAMPLE BY ELLIE

QUIGLEY UNIX FOR PROGRAMMERS AND

USERS BY G. GLASS AND K ABLES UNIX SHELL PROGRAMMING BY S.

KOCHAN AND P. WOOD